This tutorial will show you how to run Gallium Data on your machine, along with Cassandra. Nothing will be installed permanently, it’s all done with containers, so you can throw everything away when you’re done.

There are several versions of this tutorial: 

The concepts are very similar, pick the database you're most familiar with.

This tutorial will take about 10 minutes, depending on your download speed. It can be run on anything that runs Docker.

Step 1: Make sure Docker is running

For this tutorial, we will use Docker (1). To verify that Docker is indeed available:

run the following command from a command line:

docker version

You should see some output similar to this:

Client: Docker Engine - Community
Cloud integration  1.0.33
Version:           24.0.2
API version:       1.43
etc...

The exact version numbers are not important -- the important thing is that Docker needs to be running. 

If you get an error message, you'll need to get Docker up and running before you can continue with this tutorial. Fortunately, there are lots of resources that can help you with that.

We’ll be starting two Docker containers — there are more repeatable ways of doing this using e.g. Docker Compose, Kubernetes or Nomad, but for this tutorial we want to make sure every component is visible and clearly understood.

We’ll be running both containers in their own Docker network, so let’s create that network

Run this from a command line:

docker network create gallium-net

The response should be a long string of letters and numbers, which you can safely ignore, something like:

9ef282f6d3cce819a etc...

If you get an error, it may be because you've run another Gallium Data tutorial before, in which case you can ignore the error and carry on.

Docker is now ready, let’s move on to the next step.

Step 2: Start the two Docker containers

If this is your first time running this tutorial, note that it will download about 500MB of container images, which can take a while on slower connections.

1 - Start the Cassandra database 

Run this from a command line:

docker run -d --rm --name gallium-cassandra --network gallium-net galliumdata/cassandra-demo:1

This image is simply the standard Cassandra image, with a small sample database that we will use in this tutorial.

There is nothing special about this image:  Gallium Data can run with any Cassandra database (3.x and above), including on-premise or in the cloud.


2 - Start Gallium Data

Run this from a command line:

docker run -d --rm --name gallium-data --network gallium-net -p 8089:8080 -e repository_location=/galliumdata/repo_cassandra galliumdata/gallium-data-engine:1.8.2-1782

This is the standard Gallium Data image, with a demo repository for this tutorial. In the real world, you would typically use additional options to create your own repository.


Run a Cassandra query

Connect to Cassandra using cqlsh and run some CQL queries:

Run this from a command line:

docker exec -it gallium-cassandra cqlsh -k gallium_demo gallium-data

Hint: if this fails, try again after a few moments, Cassandra can take a few seconds to warm up.

We are connecting to Cassandra through Gallium Data, not directly -- this will now allow us to do all kinds of interesting things.

You will get the Cassandra command line prompt:

Connected to Test Cluster at gallium-data:9042

[cqlsh 6.1.0 | Cassandra 4.1.1 | CQL spec 3.4.6 | Native protocol v5]

Use HELP for help.

cqlsh:gallium_demo> 

Run the following from the Cassandra command line (it may take a second the first time):

select id, country, first_name, last_name from customers;

You will see some rows of data:

 id | country | first_name | last_name     

----+---------+------------+---------------

 23 |      WF |      Wanda |      Williams 

  5 |      ES |       Eric |       Edmunds

 28 |      CN |         姚 |            明

 10 |      JP |     Juliet |     Jefferson

 16 |      PE |   Patricia |         Pérez

 13 |      MX |   Marianne |       Mohamed

 30 |      IN |      रविंद्र |     शंकर चौधरी

 11 |      KZ |       Karl |          Khan

etc...

(30 rows)


Changing a query

Let's say we don't want our users to see all customers, but we can't change the query because it's run by an application that cannot be modified.

This is easily done by applying a filter in Gallium Data that rewrites the query on its way to Cassandra

⇨ Connect to Gallium Data at: http://127.0.0.1:8089

⇨ Log in

⇨ Open the project named Simple Demo - Cassandra

⇨ Expand the Request filters area

⇨ Open the filter named Hide some customers

Note that the parameters are set so that it executes whenever a simple query is run agains the gallium_demo.customers table.

⇨ Select the Code tab

The code simply rewrites the query to only show some customers.

⇨ Select the Active checkbox

⇨ Click Publish (top)

⇨ Go back to the Cassandra command line

Re-run the same query (use up-arrow)

The result will be:

 id | country | first_name | last_name

----+---------+------------+-----------

  5 |      ES |       Eric |   Edmunds

 28 |      CN |         姚 |        明

 10 |      JP |     Juliet | Jefferson

 16 |      PE |   Patricia |     Pérez

 13 |      MX |   Marianne |   Mohamed

  1 |      AR |     Andrea | Albinioni

 27 |      JP |         明 |      黒澤


(7 rows)


This time, we are only getting the customers that are in the few countries specified in the filter code. The filter in Gallium Data has changed the CQL query sent by cqlsh from:

select * from customers

to:

select * from customers WHERE country IN ('AR','JP','MX', 'CN', 'PE', 'ES') allow filtering

and that's what Cassandra has actually executed. 

We have just changed the way an application (cqlsh) works with a database (Cassandra) without touching either the application or the database.

Changing a result set

Now let's see how we can modify the data coming back from Cassandra before it gets to the client.

Our requirement is that the price of products with a status of 'discontinued' should be set to zero.

First, let's see the data as it is in the database.

Run the following from the Cassandra command line :

select * from products;

You will see a list of products:

 id | category  | list_price | name                    | status

----+-----------+------------+-------------------------+--------------

 23 | Furniture |      31.50 |           Whiskey glass | discontinued

  5 |  Pharmacy |    2499.99 |         Elixir of youth |         null

 10 |    Sports |     182.00 |                 Javelin |         null

 16 | Furniture |      75.00 |             Patio chair |         null

 13 |     Tools |      37.99 |              Microphone | back-ordered

 11 |      Toys |      15.25 |            Kaleidoscope |         null

  1 | Furniture |      99.99 |           Art Deco lamp |         null

 19 |   Kitchen |       0.99 |             Salt packet |         null

  8 | Furniture |      22.99 |    Hieroglyphics poster | discontinued

  2 |  Clothing |     549.99 |           Ballroom gown | discontinued

etc...

(26 rows)

Now let's change the data as it finds its way from Cassandra to the client. We want to zero out the price of discontinued products.

⇨ Go back to the Gallium Data admin app
⇨ Go back to the project view (top nav bar - Project Simple Demo - Cassandra)
⇨  Expand the Response filters area
⇨ Click on the response filter called Hide discontinued product prices
⇨ Select the Parameters tab

This filter has some parameters that activate it for result sets that include data from the gallium_demo.products table.

There is also a parameter specifying that the filter applies to any row with status=discontinued.

⇨ Select the Code tab

The code is a simple one-liner that sets the price to zero.

⇨ Select the Active checkbox

⇨ Click Publish (top)

⇨ Go back to the Cassandra command line

Re-run the same query (use up-arrow)

This time, all discontinued products have a price of zero:

 id | category  | list_price | name                    | status

----+-----------+------------+-------------------------+--------------

 23 | Furniture |        0.0 |           Whiskey glass | discontinued

  5 |  Pharmacy |    2499.99 |         Elixir of youth |         null

 10 |    Sports |     182.00 |                 Javelin |         null

 16 | Furniture |      75.00 |             Patio chair |         null

 13 |     Tools |      37.99 |              Microphone | back-ordered

 11 |      Toys |      15.25 |            Kaleidoscope |         null

  1 | Furniture |      99.99 |           Art Deco lamp |         null

 19 |   Kitchen |       0.99 |             Salt packet |         null

  8 | Furniture |        0.0 |    Hieroglyphics poster | discontinued

  2 |  Clothing |        0.0 |           Ballroom gown | discontinued


Nothing has changed in the database -- we're only transforming the data as it goes from Cassandra to the database client.

These are simple examples -- we could get very fancy here. We could change data depending on any number of factors, we could generate random data, we could hide certain rows and/or columns, we could even insert rows into the result set. We have complete control, and we can make the database behave in ways that would normally be almost impossible, all without any changes to the database or the database client.


A few things to try at this point, if you feel like it:

If you activate the response filter named Hide furniture and run a query against the products table, you'll see that all products with category=Furniture have been removed from the result set.  This is an example of a filter without code: it does what it needs simply by being configured properly.


Imagine that your database schema changes and a column gets renamed, but you have an existing app that cannot be changed, and therefore will break.

We can easily change the name of a column in a result set with:

let colspec = context.packet.getColumnSpecByName("name");

if (colSpec) {

    colspec.columnName = "product_name";

}

This type of thing must be done carefully because it can really confuse database clients, but it also gives you enormous control over how the data is presented to the clients. All aspects of the result set are under your control.

You can see this code in the Change column name in result set response filter. If you activate it and run a query on the products table, the name column header will change to product_name.


We can also add synthetic rows to result sets with the following:

let newRow = context.row.clone();

newRow.id = -newRow.id;

newRow.name = "**** Synthetic row ****";

newRow.list_price = -999999;

newRow.status = "**** Awesome!";

context.packet.addRow(newRow);

You can see this code in the Add synthetic row response filter -- activate it and re-run the query on products to see the new row at the bottom.

Again, no changes are made to the database, only to the result set received by the database client.

What have we seen?

In this tutorial, you got a glimpse of how Gallium Data can intercept the traffic between database client and database server, and modify this traffic. This enables you to:

Gallium Data has a number of pre-defined filters, but it also makes it easy to create your own filters and be as sophisticated as you want.

Now, the question is: how will you use it?

What to do next

We encourage you to take Gallium Data for a spin with your own database(s). It's always more interesting to work with your own data than with demo data.

The tutorial project contains several other filters, but they are not active. You can take a look at them and try to activate them:

docker logs -f gallium-data

Gallium Data is free for end-users, so you can use it as much as you want, on your machines, servers, in the cloud, wherever.

Consult the documentation for all the gritty details, such as how to use the debugger, or the API for various types of database packets.

Cleanup

Once you're done with this tutorial, and you want to remove everything that was installed,

⇨ Log out of the Cassandra command line with ctrl-D or exit

⇨ Execute the following commands from a command line:

docker stop gallium-cassandra
docker stop gallium-data
docker network rm gallium-net

This will stop all the Docker containers started during this tutorial.

If you also want to remove the Docker images:

docker rmi galliumdata/gallium-data-engine:1.8.2-1782
docker rmi galliumdata/cassandra-demo:1

This will remove everything installed by this tutorial.

We'd love to hear from you -- good or bad! Please drop us an email and let us know if you have any questions or comments.


feedback at gallium data dot com

Footnotes

(1) - Gallium Data is just a container, so it runs in anything that can run a container: Docker, podman, Kubernetes, containerd, etc... You can run this tutorial using something other than Docker, but you'll have to translate the command line options. On recent versions of Windows, you can run Docker in WSL if you prefer.