As we all know, data security is a never-ending battle. Every day, we hear of new data breaches. It's a hard problem, and there is no single solution, other than a defense in depth.
Let's look at one of those defenses for databases: query control.
Query control is a simple idea: most applications access their database(s) in a fairly predictable way. We can therefore record the requests from these applications during a period of time (like during testing). We'll call this the recording phase.
Once we're satisfied that we've seen all the request types we're likely to see, we can lock down the application by rejecting (or maybe just flagging) any requests we haven't seen before. We'll call this the enforcement phase, which is normally used during production.
This guarantees that, no matter what happens, the application will never be able to do anything in the database that it hasn't done before. This is not a fix-all, but it's a good defense.
Obviously, this approach is appropriate only for applications that do, in fact, issue a finite number of query types, but that includes many, perhaps even most applications.
In this article, we'll go over the general architecture and concepts. If you're not afraid to get your hands a little bit dirty, you can then try the step-by-step tutorial showing how to try this on your own machine -- all it requires is Docker.
Query control has a number of benefits:
it's easy to put in place, since it doesn't require any changes to the database(s) or the applications
it can mitigate the damage if an application gets compromised
it can really help with compliance, since we literally know every request that will be allowed to run
it can protect us from SQL injections (and other injections -- this concept is not limited to SQL)
it will allow us to quickly detect a change in the behavior of the applications, which can be a sign of mischief
This is not an original idea: there are some (very expensive) products that do this type of thing, but we can roll our own, at no cost, using off-the-shelf components.
To do query control, we need two new components:
Level 1: minimalist
The simplest system we can put together is:
a query filter in Gallium Data to intercept requests and record them in the request database - this is the recording filter
another query filter to verify that the request is present in the request database - this is the enforcement filter
The queries will be stored in the world's simplest table:
though the request database doesn't have to be relational: something like Redis or MongoDB would work equally well.
The recording filter in Gallium Data will simply store each request in the query table if it's not already there. This can be done with just a few lines of code (see the hands-on tutorial for details).
With that filter in place, we'll need to exercise the client application(s) to issue all the requests we expect. This could take anywhere from a few minutes to a few weeks, depending on the complexity of the application.
Once that's done, we'll have a number of requests recorded in the query table. That number can vary widely, from just a few for simple applications, to many thousands (or more) for larger applications.
We can then deactivate the recording filter and activate the enforcement filter. It will receive requests, look in the request database to see if this request is known, and reject the request if it doesn't find it (it could also ignore it, flag it, etc...). Again, this is easily done in a query filter in just a few lines of code.
That's it! We now have a running query control system, and we can easily switch between recording mode and enforcing mode by enabling and disabling these two filters.
For some applications, this may actually be sufficient. But many applications will require more.
Level 2: more flexibility
Some applications need more flexibility because they don't always use prepared statements. It's not unusual to get hard-coded values in SQL commands, such as:
We all know that this is not optimal, and that a prepared statement:
would be preferable, but the world is an imperfect place.
The solution is actually simple: we just need to normalize SQL queries, which turns out to be fairly easy (for details, see part 2 of the hands-on tutorial). Again, it just takes a few lines of code (3, to be exact) to transform a SQL Statement like:
Now all these queries will be matched by just one record in the query table. This is much better.
Level 3: parameter checking
In many situations, it makes sense to restrict the value of parameters for prepared statements and stored procedures. This is particularly true for some system stored procedures like sp_executesql, but it can apply to your own stored procedures too.
This is just one more way to control what the application does -- some parameters should only be allowed some values.
The simplest way to store the allowed values for parameters is either as strings, regular expressions, or other kinds of expressions. We'll need to expand our table, of course:
The valid values for the parameters can be stored as one value, or a few values (maybe separated by commas), or a range (e.g. 1-100), or a regular expression... You get the idea. When a parameter is out-of-bounds, our filter can choose to reject the request with an error, or change the parameter value.
In many cases, it will be possible to programmatically derive the allowable range for parameters as the requests are recorded, or their minimum and maximum length. If desired, you can also specify parameter ranges manually, of course.
There are many more things we can do at this point:
we could keep track of how many times each query gets executed
we could record the queries that get rejected for later analysis, either in a table, or perhaps in a log
we could rate-limit each request and specify that it can only be executed so many times within a given time period
we could keep track of how long each request takes, and use that information to limit the number of invocations
we could restrict each request based on its origin, the user sending it, the time of day, and whatever makes sense
we could easily speed up the lookup of queries using a hash (shown in the tutorial)
we could introduce regular expressions in the query database to make the queries more general
What else can you think of?
Query control can be a valuable and fairly easy addition to your data protection infrastructure. If your applications are good candidates, you can probably try it out in minimal time. There is a warm, fuzzy feeling when you lock down an application and know that, no matter what happens, hackers will have a tougher time exploiting it for nefarious purposes.
To really get a feel for it, you can go though the hands-on tutorial, see it running on your own machine, and tweak it or extend it.
Beyond query control, I hope you got a sense that using a proxy in front of your database opens up all kinds of new possibilities, without making any changes to your applications or your databases. Such a proxy can be a great fulcrum to control and change how your systems work.