Hiding data in DB2
There are many reasons why you might want to hide some of the data in your database from certain users.
In this article, we'll talk about fine-grained access control -- hiding data that is in fact present in the database, but should not be seen by certain users. Static data hiding (physically removing that data from a copy of the database) is a whole different approach (contrasted in this article ).
Fine-grained access control can be done in three places:
in the database itself as the data is being accessed
between the database server and the database client, using a proxy
in the database client once the data has been retrieved
Each of these approaches has its advantages and disadvantages. Let's look at each one and contrast them.
DB2's fine-grained access control capabilities
DB2 has excellent fine-grained access control capabilities, among the best in the RDBMS world.
There are two (sometimes complementary) ways to do fine-grained access control in DB2:
row and column access control (RCAC), which includes row permissions and column masks
label-based access control (LBAC), which relies on security labels and security policies
There is some overlap between these two approaches. The biggest difference is that RCAC is code-based: it involves the execution of SQL code to determine whether a piece of data is accessible and how, whereas LBAC is data-based: it relies on declarations and labels, without any code.
Row and column access control (RCAC)
RCAC, as its name suggests, consists of two parts:
row permissions, which allow you to define who has access to which rows
column masks, which allow you to define how certain columns are presented to the database client
By combining these, you can achieve most common access control requirements.
Row permissions are expressed as pieces of code that define which rows are visible. For instance:
CREATE PERMISSION CustomersEuropeOnly ON DEMO.CUSTOMERS
FOR ROWS WHERE
VERIFY_GROUP_FOR_USER(SESSION_USER, 'SALES_EUROPE') AND
COUNTRY IN ('BE', 'DK', 'ES', 'FR', 'GR', 'HN', 'IT')
ENFORCED FOR ALL ACCESS
ENABLE;
This specifies that database users who belong to group SALES_EUROPE can see the rows in table DEMO.CUSTOMERS where the COUNTRY column has one of the specified values. These values could be looked up or computed from the database, and permissions can call SQL functions including secure user-defined functions, so there is quite a bit of flexibility.
Additional permissions can be added to the same table, and DB2 will automatically apply all of them and show only the rows that satisfy at least one permission.
This type of permission covers all access to the table: select, insert, update and delete. For instance, if the user tries to insert a row that does not satisfy at least one permission, the insert will fail. Same, obviously, for update and delete.
Column masks are also expressed as pieces of code that specify how a column should be presented to the client. For instance:
CREATE MASK CUSTOMERS_LAST_NAME_MASK ON DEMO.CUSTOMERS
FOR COLUMN LAST_NAME RETURN
CASE WHEN (VERIFY_ROLE_FOR_USER(SESSION_USER, 'ENGINEER') AND COUNTRY IN ('BE', 'ES'))
THEN SUBSTR(LAST_NAME, 1, 1) || 'XXXX'
ELSE LAST_NAME
END
ENABLE;
This specifies that, for database users who have the role ENGINEER, the column LAST_NAME in table DEMO.CUSTOMERS should be masked, with just the first character showing, followed by XXXX, but only for customers in certain countries.
You can only have one mask per column, so things can get a bit ponderous if you have a lot of logic. Again, though, your logic can call functions, so this is quite flexible.
Label-based access control (LBAC)
Unlike permissions and masks in RCAC, which are defined by code, label-based access control is declarative.
It works by creating a system of label components, which can be organized as unordered sets, e.g.:
CREATE SECURITY LABEL COMPONENT COMPARTMENT
SET {'RESEARCH', 'MANAGEMENT', 'FINANCE'};
or as ordered arrays, where higher levels subsume lower levels, e.g.:
CREATE SECURITY LABEL COMPONENT CLASSIFICATIONLEVEL
ARRAY [ 'TOP SECRET', 'SECRET', 'CONFIDENTIAL' ];
or in trees, where again higher levels subsume lower levels:
CREATE SECURITY LABEL COMPONENT REGION
TREE (
'WORLD' ROOT,
'AMERICAS' UNDER 'WORLD',
'CANADA' UNDER 'AMERICAS',
'USA' UNDER 'AMERICAS',
'CALIFORNIA' UNDER 'USA',
'TEXAS' UNDER 'USA',
etc...
This is quite powerful, though managing all these can quickly become a challenge (of course, there are commercial products to help with that challenge).
Label components can then be assembled into labels, e.g.:
CREATE SECURITY LABEL POLICY2024.RESEARCH_TOPSECRET_USA
COMPONENT COMPARTMENT 'RESEARCH',
COMPONENT CLASSIFICATIONLEVEL 'TOP SECRET',
COMPONENT REGION 'USA';
These labels can then be used to protect individual columns:
CREATE TABLE DEMO.CUSTOMERS (
ID INT NOT NULL,
FIRST_NAME NVARCHAR(100) NOT NULL,
LAST_NAME NVARCHAR(100) NOT NULL,
COUNTRY CHAR(2) NOT NULL,
SALARY NUMERIC(12,2) NOT NULL SECURED WITH FINANCE_CONFIDENTIAL_AMERICAS,
SECLABEL DB2SECURITYLABEL NOT NULL,
PRIMARY KEY (ID)
)
SECURITY POLICY POLICY2024;
as well as individual rows:
INSERT INTO DEMO.CUSTOMERS(ID, FIRST_NAME, LAST_NAME, COUNTRY, SECLABEL) VALUES
(21, 'Wernher', 'von Braun', 'US', SECLABEL_BY_NAME('POLICY2024', 'RESEARCH_TOPSECRET_USA'));
Finally, users, roles and groups can be given access to the labels, for reading, writing, or both:
GRANT SECURITY LABEL POLICY2024.RESEARCH_TOPSECRET_USA
TO USER TJONES FOR READ ACCESS;
There is even more to LBAC, such as an exemption mechanism allowing
This is an elaborate system, which has complex rules governing things like security upgrades, default behaviors, and everything else security auditors like to poke around. It is powerful enough to implement the vast majority of data hiding needs, though scaling this system to cover large, complex requirements with many dimensions can be a challenge.
The pros and cons of DB2's fine-grained access control
Pros
built into the database: no need for any other software, other than perhaps something to help manage all that security metadata
rock-solid and certified: this is in use by many of the most security-conscious organizations around the world
automatically governs all queries and updates
Cons
adds an additional burden on the database. Most of these mechanisms are actually implemented by merging them into the query, which can result in some very complex (and expensive) queries if you have a lot of masks, permissions and labels
permissions are attached to the database users and their groups and roles, but many applications share one database user account for many users, making this approach only partially relevant to application security
a permission that becomes invalid (which can easily happen in the case of complex permissions) will block all access to the table for which it was defined. This can obviously be disruptive, and non-trivial to debug
does not cover certain data types, such as XML and LOBs
it's only possible to manage all these labels and permissions as a special database user (specifically, a DB2 user with the SECADM authority)
the built-in mechanisms can only work with information that is available to the database server: it's difficult for the permissions to react to the behavior of the application, for instance
as good as it is, it's not bullet-proof -- it's fairly easy to get around some aspects of data masking, for instance, though, to be fair, that's a generalized problem with data masking
Many of the advantages and disadvantages of this solution come from the fact that it's a centralized solution, which is great for some scenarios, and less great for others.
Proxy-based fine-grained access control
A very different approach consists of hiding data outside of the database, between the database server and the database client, using a programmable proxy.
This is usually a more decentralized approach: some trusted clients may access the database directly, whereas some less trusted clients may have to go through the proxy. And of course, you may have different proxies for different clients.
Because this type of access control is done outside of the database, it is a less powerful approach in some ways, but it does have its advantages.
Restricting or rewriting queries
A proxy can typically intercept any SQL command on its way to the database, and potentially change it, or reject it outright.
The simplest scenario is to have a list of authorized queries, and any time a client issues a query that is not on the list for that client, the query is rejected.
A more complex, but more powerful approach, consists of rewriting the query so that it satisfies our data hiding requirements. This rewriting can of course vary depending on who the user is, where the call is coming from, any other other useful conditions.
For instance, a proxy can easily intercept a query like:
SELECT * FROM DEMO.CUSTOMERS WHERE COUNTRY = 'FR'
and rewrite it to:
SELECT ID, FIRST_NAME, SUBSTR(LAST_NAME, 1, 1) || 'XXXX' AS LAST_NAME, COUNTRY, 0 AS SALARY
FROM DEMO.CUSTOMERS WHERE COUNTRY = 'FR'
before it gets sent to the database. This approach works best for known queries, although it is often possible to analyze the query dynamically and modify it as needed.
Modifying result sets
When the database responds to a query, the result set goes through the proxy, which can manipulate it as needed.
For example, using Gallium Data as a proxy, a trivial result set filter might look like:
if (row.COUNTRY === 'FR') {
row.LAST_NAME = row.LAST_NAME.substring(0, 1) + "XXXX";
row.SALARY = 0;
}
This extra processing will add a small amount of latency, but that work needs to be done somewhere, and by doing it in the proxy, you're saving the database from having to do it.
In this context, the proxy has complete control over result sets: it can modify rows, remove entire rows, or even insert new rows into the result sets.
The proxy can do much more than just manipulate queries and result sets: it can also log access to certain data, notify other systems, aggregate data from multiple sources, etc...
The pros and cons of proxy-based data hiding
Pros
requires no changes to the database, the server, or the clients. This is clearly the most attractive feature of this approach. You don't need special access to the database, and nothing has to change in your existing system.
puts no extra load on the database
can be modulated based on the database user, the application user, the application's behavior, or any other factors
different proxies can enforce different constraints
the constraints are administered outside of DB2: this requires no special permissions in the database
logic in the proxy can be more flexible than in DB2: for instance, it can react to the behavior of the client
Cons
slight performance cost (typically on the order of 5%-10% increased latency)
limited in the case of free-form queries, since the proxy will never understand the queries as well as the database
cannot entirely hide some data (e.g. aggregate data can reveal the existence of hidden data)
Hiding data in the client
At the opposite end of the spectrum from access control in the database is access control in the database client(s).
This can be done either outside of the application, typically using special database drivers, or in the application itself.
This approach became relevant with the arrival of client-server applications, but it has now fallen out of favor because of its significant disadvantages:
1 - The clients must be trusted
This is perhaps the biggest objection: because the (supposedly) hidden data does in fact find its way to the client, it's much more difficult to ensure that it cannot be seen by a malicious user. The software doing the access control is at the leaf node, and the user presumably has some level of control over the hardware: that's a recipe for shenanigans.
2 - Management is difficult
In addition, the client applications have to be modified when security requirements change, and managing and upgrading the clients can be a challenge.
Side note: if you think hiding data in the client is weird, it turns out that Microsoft SQL Server's data classification system actually relies on the clients to hide classified data (this can be rectified with the judicious use of a proxy).
Conclusion
The easiest way to look at this is to ask yourself what will serve your requirements best: a centralized solution, or a decentralized solution?
If you're OK managing all your access control in DB2, then you should clearly go that way: it's extremely solid and flexible, and it's all in one place
If you'd rather not change your database or your applications, then using a proxy may be the easiest route -- sometimes the only route
Finally, there are almost no scenarios in which client-side data hiding is desirable any more