Data masking:
SQL Server vs. Gallium Data
Introduction
In SQL Server 2016, Microsoft introduced a new feature called dynamic data masking, which allows you to mask the values of certain columns and keep some data hidden from certain users without having to modify your applications.
Let's take a look at how SQL Server does data masking, and compare it to the way Gallium Data goes about it.
Bottom line:
SQL Server data masking has the main advantage of being built-in
Gallium Data has the main advantage of being much more flexible
Demo database
For this article, we'll use the following trivial database :
create table demo.customers (
id int not null,
name nvarchar(100) not null,
country char(2) not null,
primary key (id)
)
insert into demo.customers values
(1, 'Andrea', 'AR'),
(2, 'Bruno', 'BE'),
(3, 'Charles', 'CA'),
(4, 'Daniella', 'DK'),
(5, 'Eric', 'ES')
SQL Server dynamic data masking
Let's assume you want to partially or completely hide the customers' names. With SQL Server, you can define a dynamic mask with:
alter table demo.customers
alter column name
add masked with (function = 'partial(1,"XXXXXXX",0)')
Any user (other than the owner of the table and DBAs) will then see the data as masked, e.g.:
execute as user='test_user'
select * from demo.customers
revert
id name country
1 AXXXXXXX AR
2 BXXXXXXX BE
3 CXXXXXXX CA
4 DXXXXXXX DK
5 EXXXXXXX ES
You may wish to exempt some users from this type of masking with:
grant unmask to test_user
but that's a database-wide permission: that user will now have unmasked access to all masked data.
Overall, this is a nice feature, and it's fairly easy to use, but it does have some limitations. What if we need a mask that depends on the value, or on the user? You can only use a few built-in functions to do masking, and you cannot use your own functions.
How secure is this?
Data masking works well if your users are limited to a fixed set of SQL queries, which is often the case. But if database users can create their own SQL queries, they can easily bypass data masking with queries such as:
execute as user='test_user'
select * from demo.customers where name like 'Br%'
revert
id name country
2 BXXXXXXX BE
which allows the user to determine that the second letter of customer 2's name is 'r'. Obviously it's pretty easy to extend that and make an end-run around the masking. That's fine: data masking is not meant to be a security measure at the database level, it's just meant to help you make your applications more secure.
Gallium Data masking
Gallium Data takes a very different approach. It acts as a smart proxy between database clients and database servers, and executes user-defined logic (expressed as filters and JavaScript code) that can modify that network traffic when appropriate.
For data masking, Gallium Data has two options: either modify the SQL command on its way to SQL Server, or filter the result set coming back from SQL Server.
Request filter - change the SQL command
A simple request filter in Gallium Data can look for the statement:
select id, name, country
from demo.customers where name like '?'
and rewrite it to:
select id, substring(name, 1, 1) + 'XXXX' as name, country
from demo.customers where name like '?'
This works well if you know in advance what SQL commands to expect.
You can of course be as fancy as you want in how you rewrite the query:
select id,
case
when country in ('ES', 'DK', 'BE') then substring(name, 1, 1) + '€€€€'
when country in ('US', 'CA') then substring(name, 1, 1) + '$$$$'
else name
end as name,
country
from demo.customers
where name like '?'
which will make the mask depend on the country column. Note that this rewriting can change depending on who the user is, the time of day, or any other conditions.
Response filtering - change the result set
The other (often complementary) option is to change the result set as it comes back from SQL Server. This is more expensive because we'll potentially execute code for every row in the result set, but modern computers are amazingly fast.
The JavaScript code for the filter might look something like:
let pkt = context.packet;
if (pkt.country == 'BE' || pkt.country == 'DK' || pkt.country == 'ES') {
pkt.name = pkt.name.substring(0, 1) + "€€€€";
}
else if (pkt.country == 'US' || pkt.country == 'CA') {
pkt.name = pkt.name.substring(0, 1) + "$$$$";
}
You now have complete freedom over how or whether to mask the data.
Advantages of SQL Server dynamic masking
Built-in
Best performance
vs. Gallium Data
Requires another system
Not as performant as SQL Server
Advantages of Gallium Data masking
Masking is completely up to you
Masking can be depend on the value of any columns
Masking can be different depending on the user, the user's IP address, etc...
Masking can be done on computed columns
Masking/unmasking can be done with surgical precision
vs. SQL Server
Masking can only be done using a limited set of functions
Masking is uniform -- all rows get the same mask
There is only one masking option per column -- everyone gets the same
Masking only works on stored columns
Masking/unmasking is all-or-nothing per user
Conclusion: who does it better?
These two approaches have different merits. They are not exclusive: it's possible to use both to implement different aspects of the requirements.
SQL Server has a simple mechanism that is pretty much all-or-nothing. This is appropriate for many applications that require nothing more.
Gallium Data gives you a lot more flexibility and may be preferable for the more complex cases when the masking requirements are not straightforward. Perhaps you need to mask only some rows for some users, or perhaps the masking depends on the data itself. You get to decide, without limitations.
P.S.
If you're intrigued, you may be interested to see how Gallium Data can also do row-level security, or work with SQL Server's data classification feature to restrict data access, and how Gallium Data can easily restrict queries based on prior behavior.