This tutorial will show you how to run Gallium Data on your machine, along with Vertica. Nothing will be installed permanently, it’s all done with containers, so you can throw everything away when you’re done.

There are several versions of this tutorial: 

The concepts are the same, pick the database you're most familiar with.

This tutorial will take about 10 minutes, depending on your download speed. It can be run on anything that runs Docker.

Step 1: Make sure Docker is running

For this tutorial, we will use Docker (1). To verify that Docker is indeed available:

run the following command from a command line:

docker version

You should see some output similar to this:

Client: Docker Engine - Community
Cloud integration  1.0.33
Version:           24.0.2
API version:       1.43
etc...

The exact version numbers are not important -- the important thing is that Docker needs to be running. 

If you get an error message, you'll need to get Docker up and running before you can continue with this tutorial. Fortunately, there are lots of resources that can help you with that.

We’ll be starting two Docker containers — there are more repeatable ways of doing this using e.g. Docker Compose, Kubernetes or Nomad, but for this tutorial we want to make sure every component is visible and clearly understood.

We’ll be running all containers in their own Docker network, so let’s create that network

Run this from a command line:

docker network create gallium-net

The response should be a long string of letters and numbers, which you can safely ignore, something like:

9ef282f6d3cce819a etc...

If you get an error, it may be because you've run another Gallium Data tutorial before, in which case you can ignore the error and carry on.

Docker is now ready, let’s move on to the next step.

Step 2: Start the two Docker containers

If this is your first time running this tutorial, note that it will download about one gigabyte of container images, which can take a while on slower connections.

1 - Start the Vertica database 

Run this from a command line:

docker run --rm --name gallium-vertica --network gallium-net vertica/vertica-ce:12.0.2-0

IMPORTANT! Wait until you see:

Vertica is now running

This may take a minute or two -- please be patient, Vertica takes a little while to get started.

This image is simply the standard Vertica image, which contains a small sample database that we will use in this tutorial.

There is nothing special about this image:  Gallium Data can run with any Vertica database (8.x and above), including on-premise or in the cloud.


2 - Start Gallium Data

In a different command line, run the following:

docker run -d --rm --name gallium-data --network gallium-net -p 8089:8080 -e repository_location=/galliumdata/repo_vertica galliumdata/gallium-data-engine:1.8.3-1851

This is the standard Gallium Data image, with a demo repository for this tutorial. In the real world, you would typically use additional options to create your own repository.


Run a Vertica query

To connect to Vertica using vsql and run some SQL queries:

Run this from a command line:

docker exec -it gallium-vertica /opt/vertica/bin/vsql -h gallium-data --dbname VMart -U DBADMIN

We are connecting to Vertica through Gallium Data, not directly -- this will allow us to do all kinds of interesting things.

You will get the Vertica command line prompt:

Welcome to vsql, the Vertica Analytic Database interactive terminal.


Type:  \h or \? for help with vsql commands

       \g or terminate with semicolon to execute query

       \q to quit


VMart=>

Run the following from the Vertica command line (it may take a few seconds the first time):

select store_key, store_name, store_address, store_city, store_state, total_square_footage from store.store_dimension limit 15;

You will see a few rows of data:

 store_key | store_name |  store_address  | store_city  | store_state | total_square_footage 

-----------+------------+-----------------+-------------+-------------+----------------------

         1 | Store1     | 16 Elm St       | Concord     | CA          |                 2000

         2 | Store2     | 448 School St   | Columbia    | SC          |                 2000

         3 | Store3     | 224 Humphrey St | Peoria      | AZ          |                 1000

         4 | Store4     | 240 School St   | Bellevue    | WA          |                 8000

         5 | Store5     | 199 Green St    | Topeka      | KS          |                 9000

         6 | Store6     | 82 Church St    | New Haven   | CT          |                 4000

         7 | Store7     | 125 Main St     | Berkeley    | CA          |                 6000

         8 | Store8     | 299 Lake St     | Carrollton  | TX          |                 3000

         9 | Store9     | 222 Elm St      | Gary        | IN          |                 4000

        10 | Store10    | 180 Maple St    | Vallejo     | CA          |                 7000

        11 | Store11    | 346 Humphrey St | Norwalk     | CA          |                 2000

        12 | Store12    | 163 Railroad St | Lowell      | MA          |                10000

        13 | Store13    | 162 Market St   | San Antonio | TX          |                 6000

        14 | Store14    | 73 Pine St      | Norwalk     | CA          |                 4000

        15 | Store15    | 281 Green St    | Sioux Falls | SD          |                 5000

(15 rows)


Changing a query

Let's say we want to hide the stores in California, but we can't change the query because it's run by an application that cannot be modified.

This is easily done by applying a filter in Gallium Data that rewrites the query on its way to Vertica

⇨ Connect to Gallium Data at: http://127.0.0.1:8089

⇨ Log in

⇨ Open the project named Simple Demo - Vertica

⇨ Expand the Request filters area

⇨ Open the filter named Change store select

Note that the parameters are set so that it executes whenever a simple query is run agains the store.store_dimension table.

⇨ Select the Code tab

The code simply rewrites the query to exclude stores in California.

⇨ Select the Active checkbox

⇨ Click Publish (top)

⇨ Go back to the Vertica command line

Re-run the same query (use up-arrow)

The result will be:

 store_key | store_name |  store_address  |  store_city  | store_state | total_square_footage 

-----------+------------+-----------------+--------------+-------------+----------------------

         2 | Store2     | 448 School St   | Columbia     | SC          |                 2000

         3 | Store3     | 224 Humphrey St | Peoria       | AZ          |                 1000

         4 | Store4     | 240 School St   | Bellevue     | WA          |                 8000

         5 | Store5     | 199 Green St    | Topeka       | KS          |                 9000

         6 | Store6     | 82 Church St    | New Haven    | CT          |                 4000

         8 | Store8     | 299 Lake St     | Carrollton   | TX          |                 3000

         9 | Store9     | 222 Elm St      | Gary         | IN          |                 4000

        12 | Store12    | 163 Railroad St | Lowell       | MA          |                10000

        13 | Store13    | 162 Market St   | San Antonio  | TX          |                 6000

        15 | Store15    | 281 Green St    | Sioux Falls  | SD          |                 5000

        17 | Store17    | 178 Hereford Rd | Fort Worth   | TX          |                 4000

        18 | Store18    | 180 Essex St    | Cape Coral   | FL          |                 9000

        20 | Store20    | 493 Humphrey St | Fort Collins | CO          |                 8000

        22 | Store22    | 201 Kelly St    | Allentown    | PA          |                 1000

        23 | Store23    | 419 Brook St    | San Antonio  | TX          |                 9000


This time, you will not see any stores in California. The filter in Gallium Data has changed the SQL query sent by vsql, and that's what Vertica has actually executed. 

We have just changed the way an application (vsql) works with a database (Vertica) without touching either the application or the database.

Changing a result set

Now let's see how we can modify the data coming back from Vertica before it gets to the client.

Our requirement is that the address of stores in Texas should not be visible.

⇨ Go back to the Gallium Data admin app
⇨ Go back to the project view (top nav bar - Project Simple Demo - Vertica)
⇨  Expand the Response filters area
⇨ Click on the response filter called Hide address of Texas stores
⇨ Select the Parameters tab

You'll see that this filter has some parameters that activate it for result sets that include data from the store.store_dimension table.

⇨ Click on the Code tab

The code simply replaces the value of the store_address column:

if (context.packet.store_address) {

    context.packet.store_address = "#########";

}

⇨ Select the Active checkbox

⇨ Click Publish (top)

⇨ Go back to the Vertica command line

Re-run the same query (use up-arrow)

You will see that the stores in Texas now have their address replaced with hash tags:

 store_key | store_name |  store_address  |  store_city  | store_state | total_square_footage 

-----------+------------+-----------------+--------------+-------------+----------------------

         2 | Store2     | 448 School St   | Columbia     | SC          |                 2000

         3 | Store3     | 224 Humphrey St | Peoria       | AZ          |                 1000

         4 | Store4     | 240 School St   | Bellevue     | WA          |                 8000

         5 | Store5     | 199 Green St    | Topeka       | KS          |                 9000

         6 | Store6     | 82 Church St    | New Haven    | CT          |                 4000

         8 | Store8     | #########       | Carrollton   | TX          |                 3000

         9 | Store9     | 222 Elm St      | Gary         | IN          |                 4000

        12 | Store12    | 163 Railroad St | Lowell       | MA          |                10000

        13 | Store13    | #########       | San Antonio  | TX          |                 6000

        15 | Store15    | 281 Green St    | Sioux Falls  | SD          |                 5000

        17 | Store17    | #########       | Fort Worth   | TX          |                 4000

        18 | Store18    | 180 Essex St    | Cape Coral   | FL          |                 9000

        20 | Store20    | 493 Humphrey St | Fort Collins | CO          |                 8000

        22 | Store22    | 201 Kelly St    | Allentown    | PA          |                 1000

        23 | Store23    | #########       | San Antonio  | TX          |                 9000

(15 rows)


Nothing has changed in the database -- we're only transforming the data as it goes from Vertica to the database client.

These are simple examples -- we could get very fancy here. We could change data depending on any number of factors, we could generate random data, we could hide certain rows and/or columns, we could even insert rows into the result set. We have complete control, and we can make the database behave in ways that would normally be almost impossible, all without any changes to the database or the database client.


A few things to try at this point, if you feel like it:

Hide a row using the remove() method:

if (context.packet.total_square_footage > 2000) {
  context.packet.remove();
}

You can see this code in the Hide big stores in KS response filter -- activate it and re-run the query to see a few rows disappear.

This type of simple hiding can often be done without code by setting the configuration parameters of the filter, but you get more flexibility with code.


Insert synthetic rows:

for (var i = 1; i <= 3; i++) {

    let newRow = context.packet.clone();

    newRow.store_key = 1000000 + i;

    newRow.store_name = 'Made-up store #' + i;

    newRow.store_address = i + " Fake Ave";

    context.packets.addPacket(newRow);

}

You can see this code in the Add synthetic rows response filter -- activate it and re-run the query to see the new rows.

Again, no changes are made to the database, only to the result set received by the database client.

What have we seen?

In this tutorial, you got a glimpse of how Gallium Data can intercept the traffic between database client and database server, and modify this traffic. This enables you to:

Gallium Data has a number of pre-defined filters, but it also makes it easy to create your own filters and be as sophisticated as you want.

Now, the question is: how will you use it?

What to do next

We encourage you to take Gallium Data for a spin with your own database(s). It's always more interesting to work with your own data than with demo data.

The tutorial project contains several other filters, but they are not active. You can take a look at them and try to activate them:

docker logs -f gallium-data

Gallium Data is free for end-users, so you can use it as much as you want, on your machines, servers, in the cloud, wherever.

Consult the documentation for all the gritty details, such as how to use the debugger, or the API for various types of database packets.

Cleanup

Once you're done with this tutorial, and you want to remove everything that was installed,

⇨ Log out of the Vertica command line with ctrl-D or \q

⇨ Execute the following commands from a command line:

docker stop gallium-vertica
docker stop gallium-data
docker network rm gallium-net

This will stop all the Docker containers started during this tutorial.

If you also want to remove the Docker images:

docker rmi galliumdata/gallium-data-engine:1.8.3-1851
docker rmi vertica/vertica-ce:12.0.2-0

This will remove everything installed by this tutorial.

We'd love to hear from you -- good or bad! Please drop us an email and let us know if you have any questions or comments.


feedback at gallium data dot com

Footnotes

(1) - Gallium Data is just a container, so it runs in anything that can run a container: Docker, podman, Kubernetes, containerd, etc... You can run this tutorial using something other than Docker, but you'll have to translate the command line options. On recent versions of Windows, you can run Docker in WSL if you prefer.