Data masking:

SQL Server vs. Gallium Data

Introduction

In SQL Server 2016, Microsoft introduced a new feature called dynamic data masking, which allows you to mask the values of certain columns and keep some data hidden from certain users without having to modify your applications.

Let's take a look at how SQL Server does data masking, and compare it to the way Gallium Data goes about it.

Bottom line:

  • SQL Server data masking has the main advantage of being built-in

  • Gallium Data has the main advantage of being much more flexible


Demo database

For this article, we'll use the following trivial database :

create table demo.customers (

id int not null,

name nvarchar(100) not null,

country char(2) not null,

primary key (id)

)

insert into demo.customers values

(1, 'Andrea', 'AR'),

(2, 'Bruno', 'BE'),

(3, 'Charles', 'CA'),

(4, 'Daniella', 'DK'),

(5, 'Eric', 'ES')

SQL Server dynamic data masking

Let's assume you want to partially or completely hide the customers' names. With SQL Server, you can define a dynamic mask with:

alter table demo.customers

alter column name

add masked with (function = 'partial(1,"XXXXXXX",0)')

Any user (other than the owner of the table and DBAs) will then see the data as masked, e.g.:

execute as user='test_user'
select * from demo.customers
revert

id name country

1 AXXXXXXX AR

2 BXXXXXXX BE

3 CXXXXXXX CA

4 DXXXXXXX DK

5 EXXXXXXX ES

You may wish to exempt some users from this type of masking with:

grant unmask to test_user

but that's a database-wide permission: that user will now have unmasked access to all masked data.

Overall, this is a nice feature, and it's fairly easy to use, but it does have some limitations. What if we need a mask that depends on the value, or on the user? You can only use a few built-in functions to do masking, and you cannot use your own functions.

How secure is this?

Data masking works well if your users are limited to a fixed set of SQL queries, which is often the case. But if database users can create their own SQL queries, they can easily bypass data masking with queries such as:

execute as user='test_user'
select * from demo.customers where name like 'Br%'
revert

id name country

2 BXXXXXXX BE

which allows the user to determine that the second letter of customer 2's name is 'r'. Obviously it's pretty easy to extend that and make an end-run around the masking. That's fine: data masking is not meant to be a security measure at the database level, it's just meant to help you make your applications more secure.

Gallium Data masking

Gallium Data takes a very different approach. It acts as a smart proxy between database clients and database servers, and executes user-defined logic (expressed as filters and JavaScript code) that can modify that network traffic when appropriate.

For data masking, Gallium Data has two options: either modify the SQL command on its way to SQL Server, or filter the result set coming back from SQL Server.


Request filter - change the SQL command

A simple request filter in Gallium Data can look for the statement:

select id, name, country

from demo.customers where name like '?'

and rewrite it to:

select id, substring(name, 1, 1) + 'XXXX' as name, country
from demo.customers where name like '?'

This works well if you know in advance what SQL commands to expect.

You can of course be as fancy as you want in how you rewrite the query:

select id,

case

when country in ('ES', 'DK', 'BE') then substring(name, 1, 1) + '€€€€'

when country in ('US', 'CA') then substring(name, 1, 1) + '$$$$'

else name

end as name,

country

from demo.customers

where name like '?'

which will make the mask depend on the country column. Note that this rewriting can change depending on who the user is, the time of day, or any other conditions.


Response filtering - change the result set

The other (often complementary) option is to change the result set as it comes back from SQL Server. This is more expensive because we'll potentially execute code for every row in the result set, but modern computers are amazingly fast.

The JavaScript code for the filter might look something like:

let pkt = context.packet;

if (pkt.country == 'BE' || pkt.country == 'DK' || pkt.country == 'ES') {

pkt.name = pkt.name.substring(0, 1) + "€€€€";

}
else
if (pkt.country == 'US' || pkt.country == 'CA') {

pkt.name = pkt.name.substring(0, 1) + "$$$$";

}

You now have complete freedom over how or whether to mask the data.

Advantages of SQL Server dynamic masking

  • Built-in

  • Best performance

vs. Gallium Data

  • Requires another system

  • Not as performant as SQL Server

Advantages of Gallium Data masking

  • Masking is completely up to you

  • Masking can be depend on the value of any columns

  • Masking can be different depending on the user, the user's IP address, etc...

  • Masking can be done on computed columns

  • Masking/unmasking can be done with surgical precision

vs. SQL Server

  • Masking can only be done using a limited set of functions

  • Masking is uniform -- all rows get the same mask

  • There is only one masking option per column -- everyone gets the same

  • Masking only works on stored columns

  • Masking/unmasking is all-or-nothing per user

Conclusion: who does it better?

These two approaches have different merits. They are not exclusive: it's possible to use both to implement different aspects of the requirements.

SQL Server has a simple mechanism that is pretty much all-or-nothing. This is appropriate for many applications that require nothing more.

Gallium Data gives you a lot more flexibility and may be preferable for the more complex cases when the masking requirements are not straightforward. Perhaps you need to mask only some rows for some users, or perhaps the masking depends on the data itself. You get to decide, without limitations.

P.S.

If you're intrigued, you may be interested to see how Gallium Data can also do row-level security, or work with SQL Server's data classification feature to restrict data access, and how Gallium Data can easily restrict queries based on prior behavior.