Column-level encryption:

SQL Server vs. Gallium Data

You can't get away from it: there are massive data breaches all the time, even from highly security-conscious organizations. It's really difficult to keep data secure.

To help alleviate this problem, Microsoft SQL Server offers a number of encryption features. Let's see how they work, and how Gallium Data can complement them, or even replace them in some cases.

A few thoughts about encryption

In theory, encryption is great. It allows you to protect your data. If you play your cards right, and someone steals your data, they'll be unable to read it without the proper keys. What's not to love?

We'll disregard the performance implications of encrypted data: modern computers are amazingly fast, and SQL Server is very good at optimizing things. Let's focus on the two biggest disadvantages of data encryption.

Problem 1: it's complicated

As you'll see in a minute, the more encryption you use, the more complicated things become. Microsoft has done a good job providing you with helpful tools, but there is no getting around it: anything that's encrypted requires additional complexity. It makes writing applications harder. Debugging can be a real bear -- why doesn't this value decrypt correctly? Error messages are often vague and confusing. You're definitely paying a price.

Problem 2: key management is hard

Managing your keys and certificates is a serious pain in the neck. There are many (so many!) solutions to this problem, like Azure Key Vault or AWS Secrets Manager, but they only make the problem somewhat less painful. It's hard to keep track of which keys belong to whom, who should have access to them, and it's hard to protect these keys from unauthorized access. This often leads to unsafe practices: DBAs, programmers and users can sometimes cut corners and end up negating the benefits of encryption. A classic example is the staggering number of private keys that can be found in GitHub because programmers inadvertently check them in with their source code.

If you've never had to deal with this, I strongly advise you to think about it before you commit to data encryption. It has great benefits, but the costs should not be ignored. Make sure these costs are worth it to you.

SQL Server encryption

First, let's get the obvious out of the way. It's very easy to encrypt an entire database in SQL Server. A simple command such as:

alter database demo set encryption on

will take care of it (assuming you've created the required key and certificate). This will make sure that the database files (including logs), when stored on disk, are encrypted. There are corresponding capabilities to encrypt backup files, of course. If an adversary were to somehow get a hold of these files, they would (in theory at least) be unable to read them unless they have the encryption key.

This is a good security step, but it's far from sufficient.

The problem is that, for the database server to be able to use these encrypted database files, it obviously must be given the key. And any time you have the key and the encrypted data in the same place at the same time, you run the risk of data leaking out.

Also, the entire database is encrypted with one key, so if that key gets away from you, the bad guys can now potentially read all your data.

SQL Server column-level encryption

To address this problem, at least partially, SQL Server offers column-level encryption, which allows you to encrypt different parts of the database with different keys.

This is a somewhat manual process, though. You define your column as being of type varbinary:

create table demo.customers (

id int,

name_crypt varbinary(200)

)

You then need to access that column using encrypting and decrypting functions. So, for instance, instead of the usual insert:

insert into demo.customers values

(100, 'Arianna')

you'll use:

insert into demo.customers values

(100, EncryptByKey(Key_GUID('CustomersKey'), 'Arianna'))

To retrieve the value, you then do the reverse:

select id, convert(varchar, DecryptByKey(name_crypt))) as name

from demo.customers

where convert(varchar, DecryptByKey(name_crypt)) = 'Arianna'

This works well, but you can see that the SQL commands get rather ugly. The database has no idea that the data is encrypted: as far as it's concerned, it's just bytes.

Overall, you get more flexibility, because some keys may be accessible only to certain people. But one of the basic problem is still there: the keys are accessible to the database, therefore an adversary who can break into the database may be able to read the encrypted data.

Always encrypted columns

To address that problem, SQL Server offers yet another level of encryption called Always Encrypted. When you specify that a column is Always Encrypted, you're telling SQL Server that you don't want it to have the keys to your data: only the database clients will have the keys. SQL Server will therefore have no way to decrypt this data. This can be particularly attractive when using cloud databases, where you don't have much control over the database.

Of course, this raise an obvious question: how does the database run queries on data that it can't read?

To deal with that, SQL Server has two types of encryption: randomized and deterministic.

Randomized encryption

Randomized encryption returns a different result every time you encrypt a given value. This is the most secure way to store data -- if you don't have the key, you cannot know anything about that data, other than perhaps its approximate size. The downside is that a column encrypted in this way is completely opaque to the database: it cannot index it nor query it. All it can do is retrieve it and let the client do the decrypting.

This is useful when you have data that you don't expect you'll ever need to use in a query. For instance, if we have confidential notes about our customers, we could encrypt them with randomized encryption because we never search or sort by these notes -- we always retrieve customers by name or by ID.

Randomized encryption cannot be used for columns that are computed, or used in primary or unique keys, or in an index. They also cannot be of type text, ntext, image, xml and a number of other types. They cannot have a check, unique or default constraint. There are quite a few restrictions, but they generally make sense (except perhaps for some data types: why can't you encrypt text, xml or geometry?)

Deterministic encryption

Deterministic encryption returns the same result every time you encrypt a given value. This allows the database to do some work with it, such as exact queries or indexing, but it is noticeably less secure. Because the same value always results in the same encrypted value, it is possible for an attacker to detect (for instance) that two customers have the same balance, because both customers will have the same encrypted value.

The restrictions on columns that use deterministic encryption are fewer than with randomized encryption: these columns can be indexed, and can participate in primary keys. They can also participate in foreign keys, provided of course that both columns use the same key.

Always encrypted with secure enclave

Recent versions of SQL Server offer a feature called Secure Enclaves, which allows decryption in a separate memory space in the server. That space is theoretically inaccessible to even DBAs and system administrators. The client sends the key (over a protected channel) to the enclave, which can then safely read the encrypted data. If this sounds like black magic, that's because it is. You have to trust that Microsoft knows what they're doing (which is usually the case).

Using this feature allows you to use randomized encryption (the most secure type) and removes some of the limitations associated with randomized encryption: the columns can be indexed, they can participate in foreign keys, they can be indexed, etc...

Problems with SQL Server's always-encrypted columns

Always Encrypted is fairly nice in theory, but it does suffer from a number of problems:

1 - It's hard to use

For instance, if you want to use them from a Java application and store the keys in a Java keystore, you have to create the columns in a way that is specific to Java, and that can only be done in code -- SQL Server Management Studio cannot help you. And it would be difficult to use these columns from other, non-Java applications because they won't be able to find the key, which is specified in a Java-specific way. And we've already talked about the challenges of key management.

2 - It's not supported by all drivers

Some database drivers (such as the one for Node.js, and the older TDS drivers) do not support this feature, which means that it's effectively impossible to write applications in these languages and use always-encrypted columns.

3 - It's inflexible

Always encrypted can only use a few specific encryption algorithm, which may not be what you need, and all the rows have to be encrypted, and with the same keys.

How Gallium Data can help

Gallium Data acts as a smart proxy between the database clients and the database servers, and can modify that network traffic as needed.

By making encryption the responsibility of the proxy, your applications can access the data transparently. This is not appropriate for everyone, but in many cases, using a proxy opens up all kinds of interesting possibilities. Let's take a look at two of them.

Column-level encryption

Gallium Data can take care of the rewriting of the queries on their way to SQL Server, allowing the database clients to issue simple SQL commands, and abstract the encryption logic to the proxy.

For instance, a simple query filter in Gallium Data can look for the SQL:

select id, name

from demo.customers

where name = ?

and rewrite it to:

select id, convert(varchar, DecryptByKey(name_crypt))) as name

from demo.customers

where convert(varchar, DecryptByKey(name_crypt)) = ?

The same can be done with inserts, updates, deletes, stored procedure calls, etc...

This has certain advantages:

  • applications don't have to worry about encryption

  • the encryption logic is factored out to the proxy, so you don't need to have that logic in every app

  • decryption can be done differently for different clients, for instance by showing the encrypted data as a mask (e.g. XXXX) for users who do not have access to that data, or by using different keys for different rows.

Obviously, this approach only works if you know in advance what SQL commands to expect, so that you can intercept and rewrite them.

Always encrypted

Gallium Data can often help with on-the-fly encryption. It does take a bit of work, but the result is very flexible.

The general idea is to store the data encrypted, and have the proxy encrypt and decrypt it as it travels between the database server and the database clients -- much like what the SQL Server drivers do. We can prefix the value with some information about the key, so that we know which key to use when decrypting.

The easiest approach is to encrypt columns in place. This works well if the column is wide enough to contain an encrypted version of its value (since the encrypted value is always longer than the original value). The ideal type for this is text or ntext because they're unlimited, but char, nchar, varchar or nvarchar work equally well if they're large enough. The data type xml could be made to work, if you don't mind storing the encrypted value as an XML document. The types binary, varbinary and image can be used too, again providing they're large enough.

If the data type cannot contain an encrypted version of its value -- such as numbers, dates and so on -- then you need to find a way to store it, perhaps in an extra column. Or change the type of the column to something that can hold its encrypted value.

Example

Let's assume that we have a table containing the names of our customers, along with some confidential notes about each customer. With a request filter in Gallium Data, we can encrypt these notes when they are inserted or updated, and store them in the database, along with an identifier for the key:

Once that's in place, all we need is a response filter that will look at the value of the notes column as it comes back from the database, and decrypt it as needed.

The logic is simple: if the value of the column does not start with "crypt:", then we let it go through unchanged. Otherwise we read the 10-digit identifier of the key, and see if we have that key. If we don't have the key (eagle-eyed readers may have noticed that row 3 has a different key ID), we can render that column to reflect that, such as XXXX. If we do have the key, then we just decrypt the data:

Why would you want to do that?

Microsoft's Always Encrypted feature is solid and well-tested, but it is limited in many ways. For instance, it requires that every row in the table be encrypted. You can use two keys per column, but that's only intended for key rotation, so there is no way to have multiple keys for multiple purposes in the same column.

Microsoft also only supports some encryption algorithms -- you may prefer something else.

Also, there is some appeal to having encryption done neither in the database, in which you may have limited visibility, nor in the client, where it may be exposed. What if you don't entirely trust your clients with your keys?

I'm a simple-minded person, and I find a certain comfort in being able to see my data in its encrypted form. It allows me to understand what's going on, whereas SQL Server's Always Encrypted feature can be more opaque (especially with the secure enclaves!)

Advantages of SQL Server column-level encryption

SQL Server

  • Built-in

  • Performant

Gallium Data

  • Requires another system

  • Adds processing time (though encryption with e.g. AES is very fast)

Advantages of Gallium Data

Gallium Data

  • Complete flexibility: data can be encrypted how you want it

  • Can use any encryption method(s)

  • Can use any key management solution(s)

SQL Server

  • Inflexible: encryption is all-or-nothing

  • Limited to a few encryption algorithms

  • Limited to a few key management solutions

Conclusion

Encryption is always hard. Encryption in a distributed system is even harder.

Ultimately, it comes down to flexibility. If SQL Server does what you need out of the box, then by all means use those facilities. But if you need something that's more flexible, maybe Gallium Data is worth a look -- you may be surprised by how much it can do for you.