What is a
programmable database proxy?

Programmable database proxies may not be familiar to many people, and it's a shame because they can solve many difficult problems elegantly. This article explains what they are, what they do, and when they are useful.

A quick comparison

Let me start with a seemingly unrelated topic: web pages. Bear with me, it will all come together in a minute.

When you bring up a web page in your browser, the page you see may or may not be exactly the page that was sent by the site in question. Between your browser (the client) and the web site (the server), there may be any number of systems which, as they transmit the page, look at it and potentially block, modify, remove or insert content. We'll call these systems smart proxies: proxies because they transmit the traffic, and smart because they can potentially change that traffic based on some sort of logic.

Some of these proxies may simply be interested in recording your activity. Maybe your ISP just wants to record your browsing habits and sell that information, for instance.

Some of these proxies may block content or code that they deem unsafe or undesirable -- most corporate networks block pornography and other inappropriate content.

And some of these proxies may modify the page you're visiting by inserting ads, or customizing them, or adding tracking code.

Encryption does not necessarily prevent this, because most corporate networks require you to recognize them as certificate authorities. They can therefore insert themselves as a man-in-the-middle and get complete access to everything you do.

The point is that a web page, as you see it in your browser, may have been modified on its way to you for a variety of reasons, but most people don't realize that -- it's transparent to them.

The same principle can be applied to any structured data traveling through a network, including database traffic. A smart database proxy is a specialized proxy that sits in front of a database, receives requests intended for the database, potentially modifies these requests, and forwards them to the database. The database then responds with some data, which the proxy, again, examines and potentially changes before forwarding it back to the client.

Why would you want to do that?

For the same reason you use any type of proxy: control. Having a smart proxy in front of your database means that you suddenly have the power to change how your database clients seem to behave (by changing the requests on their way to the server), and to change how your database server seems to behave (by changing the responses on their way back to the clients).

Because a programmable proxy operates at the wire protocol level, you can change how an existing system works without modifying either the clients or the server. That's a key value of a programmable proxy: it can be added to existing systems with little or no change.

If you've never used a programmable database proxy, that may all seem a bit weird, maybe even vaguely sneaky, but installing a programmable proxy in an existing system gives you a new superpower: you can suddenly make your existing systems behave differently, with very little effort.

Two simple examples

This may all sound a bit abstract, but in fact it boils down to some very concrete actions. Let's take two examples.

Changing requests

Say you have an application that issues a query that, as it turns out, is unnecessarily expensive. Changing the app may not be an option if you don't have the source code, or maybe it would take too long. A smart proxy is an easy solution here, since it can easily look for the offending query on its way to the database:

select * from customers where country in ('MX', 'US', 'CA')

and modify it to make it less expensive:

select * from customers where country in ('MX', 'US', 'CA') limit 50

This is a trivial example but you get the point: a query issued by the client can be changed dynamically by the proxy before it reaches the database.

Of course the modified query has to be compatible with the original query issued by the client, but as long as you keep that in mind, you can do some clever transformations and optimizations, and solve many types of problems with very little effort.

Changing responses

Another example: the Czech republic has just changed its name to Czechia, so we need to update our database to reflect that. Unfortunately, we have some old apps that cannot accommodate this change, and updating these apps is not an option. What can we do?

This is another straightforward application of a smart proxy: we can set it up so that, for the applications in question (and only for those), the proxy will translate "Czechia" to "Czech republic" in the result sets before they get sent back to the clients. It can be as simple as defining a filter with:

if (context.row.country === 'Czechia') {
context.row.country = 'Czech Republic';
}

Now these applications can continue to work properly, without updating them or compromising the database. Note that you'll want to do the reverse transformation for requests.

Possible uses

By now, it should be obvious that a smart database proxy can be useful whenever a database is used. Possible scenarios include:

Logic enforcement

Because programmable proxies have deep access to all traffic to and from the database(s), they can enforce logic for literally all clients, regardless of how they were written. Want to make sure that federal contracts can never be deleted? Or that bids can only be seen in their respective countries, and only by certain people? That type of fine-grained logic is easily defined and enforced, for all clients, by programmable database proxies.

Legacy applications

Every large organization has a portfolio of applications, many of which are barely, or no longer, maintained. A programmable proxy can change how these applications work, without actually changing the applications or the databases. It's not a fix-all -- the proxy can only change the interactions of the applications with the databases -- but when it hits the spot, it can be a life-saver, and extend the lifetime of an application with minimal cost.

Fine-grained access control

A programmable proxy can keep an eye on who's doing what to what data, and change, restrict or deny access to certain data based on any number of factors: access behavior, time of day, point of origin, the nature of the data, etc...

Security is a crowded field: there are many off-the-shelf products addressing various aspects of security, and if these products do what you need, they're probably the right solution. A programmable proxy is a more generalized tool, which can step in when you need to do something that's not possible with off-the-shelf solutions.

Monitoring

There are many solutions for monitoring databases, but they tend to be relatively inflexible. A programmable proxy gives you complete access to everything going into and out of your database. Again, there are many excellent products in this area, but when you need to do something that's not addressed by off-the-shelf solutions, a programmable proxy can be a quick and easy way to do what you need. Maybe you want to throttle someone based on how much data they access, or maybe you want to record certain interactions for auditing purposes. This jack-of-all-trades aspect of programmable proxies makes them attractive for ad-hoc solutions.

Externalizing logic

New applications can also benefit from a programmable database proxy. Once you start thinking about your data as a potentially dynamic entity, you may realize that some application logic maybe does not belong in the application, but can be externalized to a programmable proxy. This makes it easier to change this logic, to customize it at deployment time, and sometimes even at runtime.

There are too many possible uses to go over here: merging data from multiple systems, masking data for security, generating data for testing, increasing latency to simulate remote access, encrypting/decrypting data transparently, introducing random failures to stress-test your system, deriving synthetic data from existing data on the fly, the list is endless.

All these uses can be summarized in one word: indirection. Crack open the pipe between database clients and database servers, and you open up a world of possibilities. A programmable database proxy is an elegant solution to many problems that would be very difficult to solve in any other way.

What about performance?

By definition, a proxy is an extra hop between your database clients and your database servers, which will usually increase apparent database response times. In most cases, that's not an issue: most applications can take a moderate increase in response times without a noticeable difference. And a proxy might be able to cache or optimize requests enough to more than make up for that.

Proxies tend to scale quite well with simple load balancers, so scalability is not usually an issue.

Note that, if you need to do the kind of things we've outlined so far, you're going to have to incur a comparable cost using any solution anyway.

But that's not the real data!

When accessing the data in your database, you expect a straight answer. A database is often a source of record: if the database says so, then it is so.

That is why the concept of a programmable database proxy may be unsettling to some. Suddenly, when accessing a database through a programmable proxy, you no longer know whether the data you're getting is in fact what's in the database, or whether it's been changed or restricted (or even created) somehow by the proxy.

That's kind of the whole point, actually: the proxy becomes part of the total system, and the database clients should not (typically) worry about whether the data they're seeing has or has not been modified by the proxy -- it's just the data they need to see. But if required, the proxy may add some sort of stamp to the response to let the client know that the response has been modified.

And, of course, you can always (if authorized) bypass the proxy and access the database directly.

A state of mind

Whether or not you need a programmable database proxy will depend on many factors, each unique to your situation. What I tried to convey in this brief introduction is that a programmable proxy opens up new options and allows you to do things that are almost impossible otherwise. It allows you to think about your systems differently. Even if you end up not using a programmable proxy, having it in your mental tool belt as a possible option will make you a better systems architect.

Google Sites

Report abuse

What is a programmable database proxy?