Query substitution

When you need to change how an application queries its database, but you can't change the application.

The problem

Imagine a business application that uses a database. While using this application, you discover that performing a specific task takes an unusually long time.

You set out to investigate, and find out that this long response time is caused by a specific query to the database. For some reason, this query, which was running fine until today, now takes a long time.

No problem, you figure. Maybe you can optimize the database -- create a new index, maybe, or update the database statistics.

After trying various things, you realize that the problem is not that easy. This query cannot be fixed just by fiddling with the database. Or maybe changing the database was never an option.

Regardless, the situation is now more serious. This query needs to be changed. Thankfully it can be rephrased in a more efficient way that returns the same data, so it should be easy. But maybe you don't own this application, and therefore you cannot change it. Or maybe this application is no longer maintained. Or maybe it would just take too much work to update it.

Whatever the reason, you are now between a rock and a hard place. This application is issuing a query that no longer works, and you seem to be out of options.

When you cannot change the server, and you cannot change the client, what's left?

Enter the database proxy

In situations like this, a database proxy can be a life saver. A database proxy sits on the network, between your database client (the business application) and your database server, and can dynamically change requests and responses as needed, without modifying the database client or the database server.

Zooming in

Let's assume, as a slightly contrived example, that our application is suddenly having performance problems when it executes the query:

select * from demo.customers where balance > ?

This is not a very good query, but that's the hand we've been dealt.

It turns out that someone has changed the customers table and added a new, very large column named photos, of type blob. That's another questionable decision, but such is life.

Our application has no interest in these photos, but it did not specify which columns to retrieve, so now our query returns these photos, needlessly consuming bandwidth and memory.

We've already determined that updating the application is not an option, and neither is changing the database. So our only viable option is to have a proxy that will substitute our improved query for the old one.

Let's see how this works with Gallium Data, a free database proxy:

Step 1: Install the database proxy

Installation is trivial -- it's a simple Docker image, so you can start it up in a minute. It can run wherever makes sense.

Step 2: Configure the proxy

Next, we need to set up the proxy so that it connects to the database server. All we need is the host name and port number of the database server. Database credentials are not required because authentication is pass-through. If we need to support encryption, we can either use a new certificate, or the same certificate as the database server (if available).

Step 3: Set up the query substitution

Next, we need to tell the proxy which query to catch, and what to substitute it with. That's done with a request filter, which will look at the queries coming in over the wire. When it detects the offending query, it will substitute the improved query, in our case:

select name, balance, address from demo.customers where balance > ?

Step 4: Redirect the application

Finally, we need to tell the application to talk to the proxy, rather than directly to the database. In most cases, this is done by changing a configuration file, an ODBC/JDBC data source, or a command-line parameter.


That's it -- the whole thing can usually be done in a few minutes. Neither the application nor the database server will notice any difference. From now on, the "bad" query will be replaced with the new query every time the application issues it.

Once this is in place, doing the same thing for another query is of course trivial -- another request filter can be created and deployed in moments.

How does it work?

You are probably familiar with proxies for things like HTTP and SSH. The principle here is the same, except that the proxy has a deep understanding of the database protocol, and can parse the packets, modify them, and reassemble them as needed.

In the middle of the proxy is a JavaScript engine that runs your logic, which can manipulate any packet as needed. For simple query substitutions, it's usually a trivial one-liner, but it can get as sophisticated as required.

Everything going over the wire is available to your logic: prepared statements with their parameters, calls to stored procedures, result sets and their metadata, etc... You can shape the dialog between applications and their database as needed. It's a lot of power -- use it wisely.

Conclusion

A database proxy opens up whole new possibilities when dealing with database applications. Query substitution is only one of many use cases: you can also augment security, do data masking, fine-grained access control, ...

To get a feel for it, you can try one of the Gallium Data tutorials for SQL Server, Postgres, MySQL or MongoDB.