Query control with a smart database proxy
OPTIONAL: If you prefer, you can watch this tutorial as a video (8:10), below.
But if you have Docker installed and running on your machine, you'll probably get more out of running it yourself (it takes about 15 minutes, depending on your download speed).
Most applications that talk to a database use a well-defined set of queries.
Given this, it might be worthwhile to capture those queries while the application is being tested, and use that knowledge in production to make sure that no other queries are being run, to "lock down" the application and make sure it is behaving as expected.
This is easily done with a smart database proxy, using Redis to record the queries.
This tutorial will show you how to set this up, how to create the filters, and how to see them in action. This tutorial should take about 15 minutes.
Important: we'll assume that you have run one of the basic Gallium Data tutorials before you run this tutorial. If you haven't, we recommend you do that first, as we'll assume a minimum of familiarity with Gallium Data.
The actors in this tutorial will be:
a Postgres database (any other supported database would work equally well)
a Gallium Data instance, acting as a proxy for the database
a pgAdmin instance, acting as the database client
We also add:
a Redis instance to record the queries sent to the database
a RedisInsight instance to see what's in Redis (this is not strictly speaking required, but it is useful for debugging)
We're going to define two filters in Gallium Data: a learning filter and an enforcement filter.
The learning filter has a simple workflow:
The database client sends a query to Gallium Data
The learning filter in Gallium Data records that query in Redis
Gallium Data forwards the query to the database.
At some point, we'll turn off the learning filter and activate the enforcement filter, which has a different flow:
The database client sends a query to Gallium Data
The enforcement filter in Gallium Data looks up that query in Redis
If the query is found, Gallium Data forwards it to the database
If the query is not found, Gallium Data rejects the query
Step 1: Start Postgres, Gallium Data and PGAdmin
⇨ Run the following 4 commands:
Note: this may fail with Error response from daemon: network with name gallium-net already exists, which can safely be ignored.
At this point, you should have all three Dockers running, just like in the basic tutorial:
It may be a good idea to verify that they are all up and running:
Step 2: Make sure Postgres and Gallium Data are up and running:
⇨ Make sure you can browse the database:
If you can open the database, everything is fine. If not, you'll need to fix the problem before you can continue.
⇨ Run the following commands:
⇨ Wait 10-20 seconds - RedisInsight takes a moment to start.
⇨ Connect to http://localhost:8001 (try again if it doesn't connect right away)
You'll be prompted to connect to an existing database, use the following information:
User name: <leave empty>
Password: <leave empty>
If you open that database and click on the Browser link on the left, you will see the (empty) Redis database. It won't be empty for long.
Step 4: Talking to Redis from a Gallium Data filter
We're going to create two filters in Gallium Data, which are going to talk to Redis, so clearly we need to have a Redis driver available. We'll use the lettuce library for that, so let's add it to our repository.
⇨ Open Gallium Data (http://localhost:8089)
⇨ Go to the Libraries page (left nav)
⇨ Select the Find tab
⇨ In the Organization field, enter io.lettuce
⇨ In the Artifact field, enter lettuce-core
⇨ Click Find Libraries, and you should see a list of available versions.
⇨ Select version 6.1.1-RELEASE and add it to the repository.
⇨ Click Publish (at the top)
Note that it takes longer than usual (it might take 10-30 seconds). This is because Gallium Data has to download the lettuce library and its dependencies. This delay only happens the first time you install a new library.
Step 5: Create the recording filter
In Gallium Data:
⇨ Click Repository (left nav)
⇨ Open the Simple Demo - PostgreSQL project and make sure that all its filters are inactive (i.e. greyed out) -- we want to keep things simple.
⇨ Create a new request filter of type Query filter - Postgres. Name it Record queries in Redis.
⇨ Make sure your filter is of type Query filter - Postgres
⇨ Set the code to:
The code creates a connection to Redis if it hasn't already, and puts that connection in the filter context, which is available to all invocations of this filter.
Then, it records the query by incrementing its count in Redis. As a bonus, we get to see how many times each query has been run.
⇨ Click Publish (at the top) to put this filter into Gallium Data
Now we're going to run some queries so that they get recorded in Redis.
⇨ Go back to pgAdmin (if you closed the tab, it's at http://localhost:8088)
⇨ Select the postgres database in the left tree
⇨ Click the Query Tool icon (top left, see screenshot)
⇨ Run a few queries, e.g. :
Hint: the F5 key will execute the query on most computers, otherwise use the arrow icon.
⇨ Go back to RedisInsight (if you closed the tab, it's at http://localhost:8001)
⇨ Click the Browser link (left nav)
⇨ Click the refresh button next to the funnel icon
You should see a few queries show up.
Note that you will not recognize some of these SQL statements. That's because they are sent automatically by pgAdmin.
Once you have run a few queries in pgAdmin, and you see them in Redis, you are ready for the last stage.
⇨ Go back to Gallium Data (if you closed the tab, it's at http://localhost:8089)
⇨ Deactivate this filter, since it should not run at the same time as the filter we're going to create next.
Step 6: Create the enforcement filter
⇨ Create a new request filter of type Query filter - Postgres. Name it Reject unknown queries
⇨ Set the code to :
The code connects to Redis if necessary, and then does a synchronous lookup of the SQL command. If the SQL is not found in Redis, then the filter rejects the request.
⇨ Click Publish to put this filter into Gallium Data.
⇨ Go back to pgAdmin
⇨ Run a few queries again, some that you've already run before, some new ones
The queries that are already recorded in the Redis database should run normally, but any queries that are not found in Redis will now be rejected with an error message.
Congratulations: you have created a query restriction system. It consists of two filters:
one filter that records incoming SQL requests in a Redis database
one filter that looks up incoming requests in Redis and rejects them if they are not found
In a few steps, using off-the-shelf components and 15 lines of code, you have created a small but useful system that would be (at best) very difficult to create using any other solution. And that's just for starters!
There is a lot more you could do at this point:
You don't have to reject unknown queries. You could log them, modify them, or record them in a separate database in Redis.
This tutorial is for Postgres, but it would work equally well with any other supported database (MySQL, SQL Server, MongoDB).
You may notice that pgAdmin sends a lot of queries to Postgres that you don't really care about. You can filter them out by using the Query Pattern parameter of the filters -- hint: a possible value is regex:.*[^']gallium_demo[^'].* - see the documentation on regular expressions for more details.
You could use a database other than Redis to record the queries, but Redis is a really good match here.
Everything you've done in this tutorial is temporary (nothing is persisted), but you can make it more permanent by running Redis with persistent storage.
What else can you do when you've got complete control over all traffic between database clients and database servers? Let your imagination roam wild.
To stop everything started during this tutorial, run the following commands:
This will stop all the Docker containers started during this tutorial.
If you also want to remove the Docker images:
This will remove everything installed by this tutorial.