Row-level security:

SQL Server vs. Gallium Data

Introduction

In SQL Server 2016, Microsoft introduced the concept of row-level security, which gives you fine-grained control over who gets access to what data, potentially down to the level of individual rows. Normally, SQL security grants coarse access to a whole table or view (SQL Server can also do it for columns), and anything more granular than that requires the use of views or stored procedures.

There is another way to do row-level security without changing the database clients, and that's with Gallium Data. Each solution has its strong points and weaknesses: let's compare them and find out which one fits which scenarios.

Bottom line:

  • SQL Server row-level security has the main advantage of being built-in

  • Gallium Data has the main advantage of being much more flexible

SQL Server's row-level security

SQL Server offers row-level security by defining SQL functions called predicates, which get automatically invoked by SQL Server when a given table is accessed. There are two kinds of predicates:

  • filter predicates, which are used for select, update and delete operations

  • block predicates, which are used for insert, update and delete operations

When a SQL command is received, SQL Server executes these predicates along with the operation, and the results of the predicates are essentially joined with the relevant tables.


Sample data

We'll use the following trivial database for our examples:

create table demo.customers (

id int not null,

name nvarchar(100) not null,

country char(2) not null,

primary key (id)

)

insert into demo.customers values

(1, 'Andrea', 'AR'),

(2, 'Bruno', 'BE'),

(3, 'Charles', 'CA'),

(4, 'Daniella', 'DK'),

(5, 'Eric', 'ES')


Filter predicates

If we assume that our database users' names start with a region prefix (e.g. EU_jdoe), we can restrict access to this table with:

create function demo.RestrictCustomers(@country as sysname)

returns table with schemabinding as

return select 1 as res

where

(@country in ('DK', 'BE', 'ES') and user_name() like 'EU_%') or

(@country in ('US', 'CA') and user_name() like 'NA_%') or

(suser_name() = 'sa')

This specifies that users whose name starts with "EU_" only get access to customers in Denmark, Belgium or Spain, and users whose name starts with "NA_" only get access to customers in Canada or the US. We also make sure that user sa can always see all customers.

This type of function can be as complex as needed, and can include joins and calls to other functions. Keep in mind, however, that every access to that table will incur the cost of that function, and of the resulting join.

We register our function as a filter predicate with:

create security policy CustomersPolicy

add filter predicate demo.RestrictCustomers(country)

on demo.customers

with (state = on)

We'll need a user to test this:

create user EU_jdoe without login

grant select on demo.customers to EU_jdoe

Then if we run a simple query like

execute as user = 'EU_jdoe'

select * from demo.customers

revert

we'll get back only the three customer rows that are in Europe.

Note that access is granted at the row level: there is no way to specify that a specific column in a specific row is off-limits using this mechanism.


Block predicates

Block predicates are similar, but they apply to inserts, updates and/or deletes. There are four kinds: after insert, before update, after update, and before delete.

For instance, we can make sure that our users can only insert data in their region with:

alter security policy CustomersPolicy

add block predicate demo.RestrictCustomers(country)

on demo.customers after insert

(you'd want to do the same for update and delete). This will check that the newly-inserted row is visible to the user who inserted it and, if not, it will reject the insert.

With this in place, a user can insert a row if it obeys our predicate:

execute as user = 'EU_jdoe'

insert into demo.customers values (100, 'Didi', 'DK')

revert

but an invalid insert will be rejected:

execute as user = 'EU_jdoe'

insert into demo.customers values (101, 'Aaron', 'AR')

revert

The attempted operation failed because the target object 'GalliumDemoDB.demo.customers' has a block predicate that conflicts with this operation. If the operation is performed on a view, the block predicate might be enforced on the underlying table. Modify the operation to target only the rows that are allowed by the block predicate.


How secure is SQL Server's row-level security?

It's not bullet-proof. For instance, user EU_jdoe does not have visibility to customer 3 (who is in Canada), but can detect that the row exists with a cleverly crafted SQL:

execute as user = 'EU_jdoe'

select * from demo.customers where 100/(id - 3) = 1

revert

Divide by zero error encountered

The fact that we're getting a divide-by-zero error tells us that a row with id=3 exists.

You can take this much further if you can run a lot of queries, for instance:

execute as user = 'EU_jdoe'

select * from demo.customers where 100/(patindex('Ch%', name) - 1) = 1

revert

Divide by zero error encountered

which allows user EU_jdoe to determine that there is at least one customer with name starting with 'Ch', even though that user is not supposed to have any visibility to that customer. It should be obvious that a clever attacker can sniff out data pretty quickly.

Overall, I'd say that row-level security is not likely to stop a malicious actor, and should be used more to complement an application's logic, for instance to externalize data access rules.

Gallium Data's row-level security

Gallium Data takes a very different approach. It acts as a smart proxy between database clients and database servers, and executes user-defined logic (expressed as filters and JavaScript code) that can modify that traffic when appropriate.

For row-level security, Gallium Data has two options: either modify the SQL command on its way to SQL Server, or filter the result set coming back from SQL Server.


Request filter - change the SQL command

For instance, a simple request filter can look for the statement:

select * from demo.customers where name like '?'

and rewrite it to:

select * from demo.customers where name like '?' and country in ('DK', 'BE', 'ES')

depending on the current user. This works well if you know in advance what SQL commands to expect.

This approach is effective for broad-stroke filtering, and can include more complex rewriting to mask or modify column values, e.g.:

select id,

case

when country in ('BE', 'DK', 'ES') and user_name() not like 'EU_%'

then '<hidden>'

when country in ('CA', 'US') and user_name() not like 'NA_%'

then '<hidden>'

else name

end as name,

country

from demo.customers where name like '?'


Response filter - change the result set

The other (and often complementary) option is to filter the result sets. Since it potentially involves execution of code for each row in a result set, this is obviously a more expensive, but very flexible solution. The JavaScript code for the filter could look something like:

let country = context.packet.country;

if (country === 'DK' || country === 'BE' || country === 'ES') {

context.packet.remove();

}

This is less efficient than changing the query, but it's much more flexible: we can modify the row, insert new rows, etc... For instance:

let country = context.packet.country;

if (country === 'DK' || country === 'BE' || country === 'ES') {

context.packet.name = '<n/a>';

context.packet.country ='??';

}

If you have to filter large numbers of rows (i.e. millions), it can be expensive (though computers are really fast these days) to do this one row at a time, which is why result set filters allow you to narrow their definition to be invoked only for certain rows. The filter above would therefore be much more efficient if it was defined with the following parameter:

Column patterns = country=DK, country=BE, country=ES

You can also specify that the row should be hidden when the column patterns are satisfied. This allows for filtering without any code -- similar to SQL Server's row-level security. But for more complex logic, JavaScript is the way to go, as it gives you total power.

Summary

Advantages of SQL Server row-level security

  • Built in: no need for an external system

  • Affects all queries and commands, regardless of projections, derivations etc...

  • Good performance, depending on the complexity of predicates

vs. Gallium Data

  • Additional complexity: requires another system that must be kept up

  • Works best with known SQL commands

  • May be more expensive

Advantages of Gallium Data

  • SQL commands can be changed in any way, including parameter values

  • Rows can be hidden, modified, or inserted

  • Does not require any special privileges in the database

  • Can be targeted for only some SQL commands

  • Does not execute in the database, therefore puts no extra load on it

  • Can apply to calls on tables, views and stored procedures, and react to any value, whether persisted or computed

  • Not database-specific: works also with PostgreSQL, MySQL and MongoDB.

  • Can be much more generic: Gallium Data sees all the requests and responses on the wire, so it can apply more general rules (i.e. "mask all columns whose name contains 'price' from users 'EU_*'")

vs. SQL Server row-level security

  • SQL commands cannot be modified

  • Can only show or hide entire rows

  • Requires high-level database privileges to manage

  • Affects all access to the table

  • Adds load to database server(s)

  • Cannot apply to computed values

  • Block predicates not available in AzureSQL

Conclusion: who does it better?

These two approaches have different merits. They are not exclusive: it's possible to use both to implement different aspects of the requirements.

SQL Server's row-level security is a good approach if all you need is hiding and showing whole rows based on predicate logic. Access is controlled regardless of the SQL commands, which makes it appropriate in dynamic environments where you don't necessarily know what commands to expect. But if that's the case, you should also be aware that row-level security can be bypassed with relatively little effort, so don't bet the farm on it.

Gallium Data gives you unlimited flexibility -- you can hide rows, but you can also change them, mask values, compute values, even insert new rows into a result set. It works best for environments where you do know what commands to expect, so you can catch them on the wire and modify them or their results.


P.S.

On a related note, you may be interested to see how Gallium Data can work with SQL Server's data classification feature to restrict data access, and how Gallium Data can easily restrict queries based on prior behavior.