Fine-grained access control using a database proxy
Summary
Fine-grained access control in databases is difficult to achieve with the right flexibility and performance. This article examines the various solutions currently in use, and proposes to expand them with smart database proxies.
What is data access control?
When thinking about a database, you have to consider two views of the data. The first one is the global view, which encompasses all the data that is stored in the database. The second view is the subset of that data that you, the user, should be able to access, and what you should be able to do to that data: read it, modify it, delete it, etc...
Controlling the latter, unsurprisingly, is called access control: it specifies who can do what to what data.
The need for access control can be divided into two areas: security and business logic. The distinction is not always crystal-clear, but it's a useful one.
Security should be obvious: you may not want certain people to have access to certain data, period. To take a trivial example, you probably don't want just anyone to be able to update someone's salary, or even read it. Only certain people should be able to do that.
Business logic is perhaps the less obvious side of access control. It refers to the need to restrict your view of the data to subsets that are relevant to the activity being performed. For instance, if I'm a manager in Canada, perhaps I should only see records for employees who are also in Canada. It's not necessarily that I can't ever see records for employees in other countries, it's just that, for the kind of things I usually do, it makes sense to restrict my view of the database to just the parts that are relevant to me. It's just easier for me if the database gives me the illusion that it only contains the data that I need: that way, I don't have to ignore all the data that is not relevant to me.
In both cases, access control can be done by the applications that access the data, but having the database do it instead greatly lightens the burden on the applications. From a security perspective, it can be reassuring to know that, no matter what happens in the applications, certain restrictions will always be enforced. From a business logic perspective, it can be more efficient to implement this type of data hiding in the database than in the multiple applications that access it. It's also more flexible: if access control changes, it will affect all applications, without the need to update them.
Security is an immensely difficult issue because you're dealing with malicious actors: people who are actively trying to trip you up, and who can be fiendishly clever, thus creating an arms race. In the rest of this article, my focus will be on business logic access control. Smart database proxies can also play a part in security, but that's for a whole different article.
How do databases provide access control?
Most databases provide some level of access control. For instance, relational databases usually have commands such as:
GRANT SELECT, UPDATE ON TABLE EMPLOYEES TO JDOE
This command specifies that user jdoe is authorized to run SELECT (i.e. read) and UPDATE commands on the table named EMPLOYEES. If these are the only relevant permissions in place, any attempt by user jdoe to delete something from table EMPLOYEES will fail because that user does not have DELETE privileges.
Notice that this is coarse. You either can, or cannot, read from table EMPLOYEES. If you can, you can read any part of any record you want. If you can't, then that table is effectively invisible to you. There are ways to go further than that, as we'll see in a minute.
Many NoSQL databases, such as MongoDB, HBase and Cassandra, also allow you to define access at the collection/table level. Some databases, such as Redis, support the concept of Access Control Lists (ACLs), which allow you to specify which users can access which records, usually based on the records' keys.
But in practice, most NoSQL databases are run without any access control, indeed without authentication at all, which leaves access control entirely up to the clients.
Fine-grained access control
What if we need more fine-grained control to our data? Let's go back to our example: you're a manager in Canada, and you should see only those employees who are also in Canada. This is record-level access control.
Most relational databases allow you to define a view on that table:
CREATE VIEW EMPLOYEES_IN_MY_COUNTRY AS
SELECT * FROM EMPLOYEES WHERE COUNTRY = @user.country
You can then grant access to that view, which provides limited access to the underlying table. That gives you some record-level access control, though there may be some non-obvious loopholes (see PostgreSQL's security_barrier, for instance).
MongoDB has a similar concept, though their views are never updatable (relational databases usually allow updates through views if certain conditions are respected).
Redis' ACLs can work here if (and only if) the relevant data is part of the key.
Cassandra has a row-level permission system that allows you to grant access to specific records for specific users, based on simple conditions.
Finer-grained access control
But this is all still relatively coarse: what if you want to provide access to only some attributes of some records? What if you want that access to depend on the time of day? What if you want to allow updates only for certain attributes of certain records, but depending on the value of those attributes? The requirements for fine-grained access control can be infinitely complex and fluid.
Some organizations use stored procedures for all access to data, and disallow all direct access to tables and views. This approach gives you complete control, but it also means that every interaction with the data must be anticipated, and that a stored procedure must be written, tested, deployed and maintained for each of these interactions. It's effective, but onerous to create and maintain. Stored procedures are often written using a proprietary language, which means deep lock-in and, possibly, some difficulty finding people who can work in that language.
Oracle has attempted to greatly expand the concept of access control with their application contexts. Those give you a great deal of flexibility, at the cost of significant complexity and lock-in, which probably explains why they are rarely used.
NoSQL databases generally assume that access control is primarily the responsibility of the applications using them.
This explains why most access control is in fact mostly implemented by applications: because the underlying databases simply don't have the flexibility required.
But there is one more possibility...
Using a smart database proxy
A smart database proxy is a specialized proxy that sits in front of a database, receives requests intended for the database, potentially modifies these requests, and forwards them to the database. The database then responds with some data, which the proxy, again, examines and potentially changes before forwarding it back to the client.
A smart database proxy can provide fine-grained access control for many of the scenarios we have seen so far by using two techniques: query modification and result filtering. It's not a cure-all, but it can be a big help.
Note: all examples shown here are done with the Gallium Data model, which uses JavaScript.
Result filtering
Result filtering is probably the most obvious thing to do: for every record returned by the database, you get to do whatever you want. You can hide it, change it, mask it, you can even insert new records into the result set.
Result filtering is appropriate if you really need to hide or change the results on a record-by-record basis. For instance, we could mask out attributes based on certain conditions:
if (context.row.country === "Canada" && !context.user.is_manager) {
context.row.salary = 0.0;
context.row.social_insurance_number = "000 000 000";
}
if (context.row.country !== 'Canada') {
context.row.hide();
}
This does not change anything in the database: it only changes the result set on its way back to the database client.
But the sky is the limit here: you can create new data from existing data, you can make calls to external systems and merge that data into the result -- let your imagination run wild.
Result filtering is very powerful, but it can be expensive if not used properly, since you may have to go over each and every record. Used properly, though, it can open entirely new avenues in many situations (and not just for access control).
Query modification
Whenever a database client issues a request, the proxy can intercept it, examine it and, based on some arbitrary logic, modify that request. Implementing our previous example, it would mean that the proxy would look for any requests of the form:
SELECT * FROM EMPLOYEES WHERE <condition>
and modify those requests to:
SELECT * FROM EMPLOYEES WHERE <condition> AND COUNTRY = 'Canada'
This example is admittedly simplistic, but the general idea is there. The equivalent for MongoDB might be:
let qry = context.packet.getQuery();
if (qry.filter) {
qry.filter.COUNTRY = {"$eq": "Canada"};
}
else if (qry.query) {
qry.query.COUNTRY = {"$eq": "Canada"};
}
These are trivial examples -- you can get quite fancy. For instance, in the case of SQL, you could UNION multiple queries, each with different projections and selections, which would give you enormous flexibility. You can make calls to external systems to supplement the data. You can cache some data, or run multiple queries to fulfill one query. Again, your imagination is the limit.
The only restriction is that the result of the final query (or queries) must be compatible with the original query -- otherwise the poor client is going to be rather confused when it receives data that does not have the expected shape, unless of course you use result filtering in conjunction with query modification to put things back in order.
Modifying the request is usually the most efficient way to go, since it will insure that only the relevant data is retrieved, but it's not unusual to use a combination of both: query modification to narrow the data as much as possible, and result filtering to tweak some of the records in a way that cannot be done in the query.
Advantages of smart database proxies
Probably the biggest advantage of using a smart proxy as part of access control is the flexibility it affords: you can do anything to the traffic between database clients and database servers, including changing it, dropping it, creating it, delaying it, caching it, and so on.
Depending on how it's handled, doing access control in a proxy may also take some load off the database. This is particularly true if the access control logic is complex -- someone has to take the hit for that complexity, and you might be better off running it in a proxy, which can easily be scaled up, than in a database, which is often much more expensive.
A database proxy can also be more dynamic than a database. You can define logic that adapts to its context and provides access control that reflects what the client needs, rather than a hard-wired set of access rules that can only be changed by an act of congress.
And there is an argument to be made that business logic access control does not belong in the database, and therefore that it makes sense to run it in a proxy, or a collection of proxies, each with its own logic. You can then think of each proxy as a different view into the database.
Disadvantages of smart database proxies
The biggest argument against a smart database proxy is the added complexity. Distributed systems are hard, and any time you add one more piece to the puzzle, you have to think long and hard about the cost/benefit ratio. Given the enormous flexibility that a smart database proxy offers, it can be an attractive addition to any database-centric system, but someone has to keep it up and running, and that's not free.
The other aspect, of course, is performance. Since a proxy is, by definition, an extra hop between the database and its clients, it will always add some time to the requests and responses. If your applications cannot tolerate a (usually small) increase in response time, then a proxy may not be a viable solution for you.
On the other hand, if you don't use a proxy to do fine-grained access control, you'll need to use something else, and that will have its own cost. It's all a matter of trade-offs and compromise.
In the real world
It should be obvious by now that there is not a single, perfect solution for fine-grained access control. A real-world solution will often need to combine several methods.
If the database supports it, it usually makes sense to use its (usual coarse) mechanisms to define the broad strokes of access control. Since that access is usually fixed, it will only define the full subset of data that you can ever access, but not necessarily the subset that you want at a particular moment. For that, a smart proxy is worth considering, since it opens up all kinds of possibilities, not just for access control, but also for traffic monitoring, auditing, logic enforcement, and so on.