This tutorial is a continuation of the Query control tutorial.
If you have not gone through parts 1 and 2, we strongly suggest that you do so, as this tutorial will not make any sense otherwise.
One limitation of our solution so far is that the query table cannot be queried efficiently (ironic, isn't it?). That's because we're using the data type nvarchar(max), because we don't want any limitations on the size of the queries. Unfortunately, that data type cannot be indexed in SQL Server (not practically anyway).
For small numbers of queries (e.g. hundreds), that's not a problem -- a table scan is negligible at that scale. But if we go into many thousands (or millions) of queries, lookups are going to become more and more expensive.
There is an easy solution: in addition to storing the normalized queries, we can also store their hash value. This will make lookups highly scalable. The cost of hashing a string is minimal, so it's well worth it.
In fact, we could store nothing but the hash value, and forget about the query itself, but that would make the query table very difficult to read.
So let's modify our table:
⇨ In Gallium Data, deactivate "Enforcement filter"
If we don't do that, we won't be able to run any SQL commands except what we've already run!
⇨ Publish to Gallium Data
⇨ In mssql-cli, run:
drop table qctrl.query;
create table qctrl.query(hash varchar(45) primary key, query nvarchar(max));
Since the query table has changed, we need to update our filters to reflect that.
⇨ Set the code in your "Connect to DB" connection filter to:
let qctrl = context.utils.createObject();
let SQLServerDriver = Java.type("com.microsoft.sqlserver.jdbc.SQLServerDriver");
let driver = new SQLServerDriver();
qctrl.conn = driver.connect("jdbc:sqlserver://gallium-mssql;" +
"databaseName=GalliumDemoDB;user=sa;password=Password1", null);
qctrl.selectStmt = qctrl.conn.prepareStatement(
"select * from qctrl.query where hash = ?");
qctrl.insertStmt = qctrl.conn.prepareStatement(
"insert into qctrl.query (hash, query) values (?, ?);");
context.connectionContext.qctrl = qctrl;
The changes are highlighted in bold -- we're just changing the prepared statements to use the hash instead of the full query.
We need to also update the query filter that records requests.
⇨ Set the code in "Recording filter" to:
let sql = context.packet.sql.toUpperCase();
// Collapse multiple spaces, tabs, etc...
sql = sql.replaceAll(/[\s]+/g, ' ');
// Replace numbers with <NUM> if they're not part of a name
sql = sql.replaceAll(/(?<!\w)\d+/g, '<NUM>');
// Replace strings with <STR>
sql = sql.replaceAll(/'(([^'])|(''))*'/g, "<STR>");
// Compute hash of SQL
let sqlBytes = context.utils.getUTF8BytesForString(sql);
let DigestUtils = Java.type("org.apache.commons.codec.digest.DigestUtils");
let sqlHash = DigestUtils.sha1Hex(sqlBytes);
// Check if it's already in the database, and insert it if not
let qctrl = context.connectionContext.qctrl;
qctrl.selectStmt.setString(1, sqlHash);
let rs = qctrl.selectStmt.executeQuery();
if ( ! rs.next()) {
qctrl.insertStmt.setString(1, sqlHash);
qctrl.insertStmt.setString(2, sql);
qctrl.insertStmt.executeUpdate();
}
rs.close();
Again, the change are highlighted in bold. We compute the hash of the SQL to look it up, and/or to insert it into the query table.
Finally, we also need to change the query filter that rejects unknown requests.
⇨ Set the code in "Enforcement filter" to:
let sql = context.packet.sql.toUpperCase();
sql = sql.replaceAll(/[\s]+/g, ' ');
sql = sql.replaceAll(/(?<!\w)\d+/g, "<NUM>");
sql = sql.replaceAll(/'(([^'])|(''))*'/g, "<STR>");
// Compute hash of SQL
let sqlBytes = context.utils.getUTF8BytesForString(sql);
let DigestUtils = Java.type("org.apache.commons.codec.digest.DigestUtils");
let sqlHash = DigestUtils.sha1Hex(sqlBytes);
context.connectionContext.qctrl.selectStmt.setString(1, sqlHash);
let rs = context.connectionContext.qctrl.selectStmt.executeQuery();
if ( ! rs.next()) {
context.result.errorMessage = "Unknown query was rejected by enforcing filter";
}
rs.close();
⇨ Publish to Gallium Data
That's it -- the system will still behave the same way, but this time it'll be much more scalable.
⇨ Exit mssql-cli with Ctrl-D
⇨ Restart it - either with up-arrow, or:
docker run -it --rm --name gallium-cli --network gallium-qctrl galliumdata/gallium-mssql-cli:1 -S gallium-qctrl -U sa -P Password1 -d GalliumDemoDB
This is necessary because we want a new database connection.
Let's switch back to recording mode:
⇨ In Gallium Data, activate "Recording filter"
⇨ Deactivate "Enforcement filter"
⇨ Publish to Gallium Data
⇨ Run a query in mssql-cli such as:
select top 5 * from gallium_demo.customers
⇨ Verify that the query has been recorded:
select * from qctrl.query where query like '%gallium_demo%' or query like '%qctrl%'
You will see that a few queries have been recorded, along with their hash value:
+------------------------------------------+--------------------------------------------------
| hash | query
|------------------------------------------+--------------------------------------------------
| 21f788f574126d90a8bc5ed790f5fe2bb6c74203 | SELECT * FROM QCTRL.QUERY WHERE QUERY LIKE <STR>
| 3d105945263070aad85193ae2469b80ae1393617 | SELECT TOP <NUM> * FROM GALLIUM_DEMO.CUSTOMERS
+------------------------------------------+--------------------------------------------------
Now let's switch to enforcement mode:
⇨ In Gallium Data, deactivate "Recording filter"
⇨ Activate "Enforcement filter"
⇨ Publish to Gallium Data
⇨ Run a new query in mssql-cli such as:
select * from gallium_demo.customers
This query will be rejected because it's not recorded in the query table. However, a known query such as:
select top 5 * from gallium_demo.customers
will run fine.
This is a relatively simple extension to our query restriction system that brings it to a higher level. We can now have thousands, even millions of queries, allowing us to record all possible activity, even for a complex application.
You can take it even further if interested:
we have not handled calls to stored procedures -- it's a very similar process, but with RPC filters instead of query filters
we could record the rejected queries in a separate table, or maybe in a log, for further analysis
we could restrict the values of certain parameters for prepared statements and stored procedures, for instance with regular expressions
we could keep track of how often queries get invoked
we could then implement limits on how often certain queries are invoked
we could cache the most frequently received requests in memory to speed things up
Your imagination is the limit.
Once you're done with this tutorial, and you want to remove everything that was installed,
⇨ Exit mssql-cli with Ctrl-D
⇨ Execute the following commands:
docker stop gallium-mssql
docker stop gallium-qctrl
docker network rm gallium-qctrl
This will stop all the Docker containers started during this tutorial.
If you also want to remove the Docker images:
docker rmi galliumdata/gallium-mssql-cli:1
docker rmi galliumdata/gallium-data-demo-mssql:4
docker rmi galliumdata/gallium-data-engine:1.9.0-2157
This will remove everything installed by this tutorial.
We hope you've enjoyed this tutorial.
We'd love to hear from you -- good or bad! Please drop us an email and let us know if you have any questions or comments.