Tutorial (HTTP)

This tutorial will show you how to run Gallium Data on your machine to change how an HTTP client and an HTTP server work, without modifying them.

As an example, we will use the popular query engine Trino, but any other system that uses HTTP (and REST in particular) would work just as well.

We're going to start an instance of Trino as a Docker container, then we'll use Gallium Data to modify the interactions between the Trino CLI and the Trino server. The client and the server communicate over HTTP using (mostly) REST/JSON, therefore we can insert Gallium Data as a proxy between the client and the server to monitor and modify the requests and the responses.

Step 1: Make sure Docker is running

For this tutorial, we will use Docker (1). To verify that Docker is indeed available:

Run the following command from a command line:

docker version

You should see some output similar to this:

Client:

 Cloud integration: v1.0.29

 Version:           20.10.20

 API version:       1.41
etc...

The exact version numbers are not important -- the important thing is that Docker needs to be running. 

If you get an error message, you'll need to get Docker up and running before you can continue with this tutorial. Fortunately, there are lots of resources that can help you with that.

We’ll be starting two Docker containers — there are easier ways of doing this using e.g. Docker Compose or Kubernetes, but for this tutorial we want to make sure every component is visible and clearly understood.

Step 2: Create a Docker network

We need to create a virtual network in Docker so that our containers can talk to each other.

Run the following command:

docker network create gallium-net

The response should be a long string of letters and numbers, which you can safely ignore, something like:

9ef282f6d3cce819a etc...

If you get an error, it may be because you've already run another Gallium Data tutorial before, in which case you can ignore the error and carry on.

Docker is now ready, let’s move on to the next step.

Step 3: Start Trino and Gallium Data

If this is your first time running this tutorial, note that Docker will download about 1.3GB of container images (most of it Trino), which can take a while on slower connections.

3a - Start Trino 

Run the following from a command line:

docker run --rm --name trino -h trino -d --network gallium-net -p 8099:8080 trinodb/trino

This may take a minute as Docker downloads the image and starts it up. 

This image is simply the standard Trino image, which contains a sample database.


3b - Start Gallium Data

Run the following from a command line:

docker run -d --rm --name gallium-data -h gallium-data --network gallium-net -p 8089:8080 -p 8098:8098 -e repository_location=/galliumdata/repo_http galliumdata/gallium-data-engine:1.7.0-1448

Again, this may take a minute. This is the standard Gallium Data image, with a demo repository, which is set up for this tutorial. In the real world, you will typically use additional options to create your own repository.


3c - Open the Trino client

We'll be using the same Trino instance we just started to run the Trino command-line interface

Any other Trino driver or client would work equally well, since Gallium Data works transparently at the network level.

Run the following from a command line:

docker exec -it trino trino --server gallium-data:8098

You should see a command prompt:

trino>

We now have a Trino command line connected to the Trino server through the Gallium Data proxy.

Step 4: Run a Trino query

Let's make sure that Trino is up and running.

Run the following from the Trino command line (it may take a few seconds the first time):

select * from tpch.sf1.customer limit 10;

You should see some sample data from Trino:

Hint: hit q to return to the command line

Now let's change how this system works by using Gallium Data.


Step 5: Change requests on their way to Trino

A simple demonstration of the power of Gallium Data can be seen with just a basic filter that will modify certain types of queries

⇨ Connect to Gallium Data at: http://localhost:8089

⇨ Log in

⇨ Open the project named Demo with Trino

⇨ Expand the Request filters area

⇨ Open the filter named Change customer request

The parameters are set so that it executes whenever a query is received that matches the regular expression:

select.*from tpch.sf1.customer.*

⇨ Select the Code tab

The code changes the query to add a qualifier:

let query = context.request.payloadString;

context.request.payloadString = query.replaceAll("from tpch.sf1.customer", 

    "from tpch.sf1.customer where acctbal < 1000");

⇨ Select the Active checkbox

⇨ Click Publish (top)

Re-run the same query from the Trino command line:

select * from tpch.sf1.customer limit 10;

Notice that, this time, you only see customers with a balance under 1000. Gallium Data has intercepted the query and re-written it to add the qualification before forwarding it to the server. We have changed how this system works, without changing either the client or the server. That is the power of Gallium Data.

Step 6: Change Trino's responses

Now let's change the response from Trino so that the customer balances are hidden for customers in the AUTOMOBILE segment. Obviously we could just change the query in the Trino command line, but in the real world we may not have control over that query. Maybe it's being executed by some third-party software, for instance.

In Gallium Data, go back to the project view by clicking PROJECT DEMO WITH TRINO in the top bar

Expand Response Filters

Open the response filter named Hide automotive balances

Note that the parameters are set so that this filter executes only for some specific responses.

⇨ Select the Code tab

The code is a single line:

context.jsonHolder.jsonPath.set("$.data[?(@[6] == 'AUTOMOBILE')][5]", null);

This line of code uses JsonPath, a powerful mechanism for querying and updating JSON documents. This line of code changes the balance to null for all customers in the AUTOMOBILE segment.

⇨ Select the Active checkbox

⇨ Click Publish (top)

Re-run the same query from the Trino command line:

select * from tpch.sf1.customer limit 10;

Notice how all the customers with mktsegment = 'AUTOMOBILE' now show a NULL balance

Nothing has changed in the database: what you are seeing is a result set that has been modified on the fly by Gallium Data as it goes from the server (Trino) to the client (the Trino CLI). 

What have we seen?

In this tutorial, you got a glimpse of how Gallium Data can intercept the traffic between client and server, and modify this traffic. This enables you to:

Here, we used Trino as an example, but anything that uses REST to communicate can potentially take advantage of this power.

Now, the question is: how will you use it?

What to do next

We encourage you to take Gallium Data for a spin with your own database(s). It's always more interesting to work with your own data than with demo data. If you have your own Trino installation, take a look in Gallium Data (Repository -> Project Demo with Trino -> Project -> Connections) to create a new connection or change the existing one. There are also tutorials for MySQL, PostgreSQL, SQL Server, MongoDB and Redis.

To continue exploring Gallium Data, you can edit the Change customer request filter to add an "order by" clause, or anything else you might want.

The demo project contains a few other filters, but most of them are not active. You can take a look at them and try to activate them:

docker logs gallium-data


Note: for the next two items, if you are running Gallium Data on a remote server instead of your local machine, you will need to change the connection's Local address parameter to reflect the machine's name or address.

You may also notice a response filter called Translate URLs -- it should always be active. It's a simple filter that converts the URLs in Trino's responses to point to Gallium Data instead of directly to Trino. Without this, the dialog between the Trino CLI and the Trino server would quickly get away from Gallium Data.

Gallium Data is free, so you can use it as much as you want, on your machines, servers, in the cloud, wherever.

Consult the documentation for all the gritty details, such as more details about the HTTP connector, or the API for the objects used by HTTP connector.

Cleanup

Once you're done with this tutorial, and you want to remove everything that was installed,

To exit the Trino command line, hit ctrl-D

⇨ Execute the following commands from a command line:

docker stop trino
docker stop gallium-data
docker network rm gallium-net

This will stop the Docker containers started during this tutorial.

If you also want to remove the Docker images:

docker rmi trinodb/trino
docker rmi galliumdata/gallium-data-engine:1.5.0-1327

This will remove everything installed by this tutorial.

We'd love to hear from you -- good or bad! Please drop us an email and let us know if you have any questions or comments.


feedback at gallium data dot com

Footnotes

(1) - Gallium Data is just a container, so it runs in anything that can run a container: Docker, podman, Kubernetes, containerd, etc... You can run this tutorial using something other than Docker, but you'll have to translate the command line options. On recent Windows versions, you can run Docker in WSL if you prefer.