Example: matching queries with a regular expression
Let's assume that we have a request filter that needs to be invoked whenever a query for the purchase_orders table is received, because we want to do something with the notes column.
The first query we see is:
So the obvious thing to do is an exact match. If that's the only query we want to match, great -- we're done.
But it's common to need more flexibility. Maybe the queries are more diverse, e.g.:
Matching all of these is possible using regular expressions.
Basic regular expression
Regular expressions in Gallium Data start with the string "regex:". Any special characters they contain must be escaped to keep their original meaning.
We can turn the first query into a regular expression with:
The only difference is the backslash before the period. That's because a period is a special character in regular expressions: it stands for any one character.
By turning this into a regular expression, we've already gained case independence: this regular expression will match any of the following:
Sometimes that's not desirable -- if you want to be case-dependent, you can use the prefix REGEX: instead.
That's a good start, but we have a number of other queries we need to match. In particular, we don't really care which columns are selected, only whether or not they include the notes column. We can broaden our regular expression with:
This will now match any of the following queries:
But that's not enough: we also need to handle the * operator with:
This will now also match:
Note the expression at the beginning:
Let's break it down:
\s stands for any white space character, i.e. space, carriage return, tab, etc...
therefore \s+ stands for at least one white space character
We then have an expression between parentheses:
which contains two parts separated by a vertical bar:
The vertical bar means "or", and the parentheses form a group, so this expression means either the first part or the second part.
The first part \* is escaped with a backslash because asterisk is a special character, therefore it simply means the asterisk character. It takes care of the asterisk in select * from...
The second part (.*\bnotes\b.*) is surrounded by parentheses, which group everything they contain, which is:
.* will match zero or more characters
\b will match a word boundary -- this is because we don't want notes to match if it's part of something else, like other_notes
notes will match the name of the column we're interested in
\b is another word boundary to make sure the word notes is free-standing and not part of another word
.* again, will match zero or more characters
Matching the table
The table name may be expressed in a variety of ways:
To match all of these, we need to specify:
This surrounds the schema name and the table name with:
["\? which will match either a double quote, or an open square bracket (which needs to be escaped with a backslash), or nothing (because of the ? operator)
["\]]? which will match either a double quote, or a close square bracket (which needs to be escaped with a backslash), or nothing
So far we haven't taken into account the fact that the query could be more specific, e.g.:
The easiest way to match this is to simply accept anything after the table name:
We can do the same at the beginning, in case there is anything in front of the query.
Our final expression is:
which will match a wide variety of select queries.
Whenever you have a regular expression that contains wide open expressions like .*, keep in mind that it will match anything. The more specific the regular expression, the better.