Rewriting MySQL queries

Sometimes you need to change a SQL request after it's sent by the client, and before it gets executed by the server. It's usually because you can't change the application that is issuing the request, for instance if it's a third-party application, or if it's no longer maintained.

The reasons vary, but they tend to fall into one of three categories:

  • the request is inefficient and needs to be optimized

  • the request does something that no longer works, and needs to be rephrased

  • the application's behavior needs to be changed

Whatever the reason is, your options are limited.

Since version 5.7, MySQL supports the concept of rewrite plugins. These are optional modules that you can add to MySQL; once installed, they are given the opportunity to modify incoming requests before they are executed.

The most commonly available implementation of these rewrite plugins is the Rewriter plugin, which can modify incoming queries such as SELECT, UPDATE and so on.

The same result can be achieved with a smart database proxy. There used to be an interesting project called mysql-proxy that could do query rewriting, but sadly it has not been maintained in 8 years. A spiritual successor of sorts is Gallium Data, which also supports other databases.

In this article, we will look at the Rewriter plugin, how it works, and how it compares with Gallium Data's request rewrite capabilities.


The Rewriter plugin

This plugin allows you to substitute certain SQL queries for others using a lookup table. Before MySQL 8.0.12, it only worked with SELECT statements, but since then it also works with INSERT, REPLACE, UPDATE and DELETE (but nothing else -- no DDL, no SET, CALL, etc...)

Installation

This plugin is available in most on-site MySQL installations, but usually not in the various cloud versions of MySQL. You install the plugin by executing the install_rewriter.sql script, which creates:

  • a schema called query_rewrite

  • a table called rewrite_rules to store the replacement patterns

  • a stored procedure called flush_rewrite_rules to flush new rewrite rules,

  • and finally the plugin itself.

There is a small performance penalty when this plugin is installed, so it is recommended to uninstall it or disable it if you're not using it.

You create a new replacement rule by simply inserting a row into the rewrite_rules table:

INSERT INTO query_rewrite.rewrite_rules ( pattern, replacement )

VALUES ( 'SELECT id, name, dob FROM mydb.customers WHERE name = ?',

'SELECT ident AS id, cust_name AS name, birthday AS dob '

'FROM mydb.custs WHERE cust_name = ?' );

This indicates that whenever the first query is received, it should be replaced with the second query.

Whenever you install (or remove) a rule, you need to flush it because they're held in memory:

CALL query_rewrite.flush_rewrite_rules()


Operation

Once the plugin is installed and active in MySQL, and the substitution rule is in place, then any time someone executes the following query:

select id, name, dob from mydb.customers where name = 'Adam'

the following SQL will actually be executed:

SELECT ident AS id, cust_name AS name, birthday AS dob

FROM mydb.custs WHERE cust_name = 'Adam'

The parameters, represented by a question mark in the pattern and the replacement, can stand either for literals, or for prepared statement parameters. Each question mark in the replacement must have a corresponding question mark in the pattern, and there is no way to change the order (e.g. the first question mark in the replacement will always be set to the first question mark in the pattern).

The matching is done after the SQL statement has been parsed, therefore meaningless differences in case and spacing are handled properly. But otherwise, the pattern must be an exact match: there can be no wildcards or other regular expressions in the pattern.

There are no real limitations to the replacement query, but it must be a single statement. For instance, you could substitute an UPDATE statement for a DELETE statement, if that makes sense in your particular scenario.

Obviously, it will be desirable in most cases to phrase the replacement query to make it compatible with the original query, but that is not a requirement.

You could, in theory, change a SELECT into, say, an UPDATE, but that would almost certainly confuse the database client that issued the request, since the response would not be the expected result set.

That's pretty much the gist of this plugin. If you want to dig a little deeper, the Readme file is a great source of information.


Gallium Data's query substitution

Gallium Data's approach is based on code rather than a lookup table (though the code can use a lookup table if desired), and it runs outside of the database.

Installation

Installing Gallium Data consists of starting an instance in Docker or equivalent (e.g. Kubernetes, podman, etc...), and configuring it to connect to the database (which requires only a host name and port number). The database client can then connect to Gallium Data as if it was the MySQL server.


Operation

To intercept a query, you create a query filter in Gallium Data and specify one or more queries for which the filter should be invoked. This can be done with regular expressions, allowing a broad range of matching options. The previous example would look like:

You can specify the requests to be intercepted using one or more regular expressions, so you can cover multiple variations. Note that this is not limited to queries: any request can be intercepted, including DDL, CALL, SET, etc...

When a request is received that matches one of the patterns, the filter is invoked, and your JavaScript code for that filter is executed. That code can then modify the request as necessary, and do whatever else is needed. The simplest thing to do is to substitute a different query, e.g.:

This approach is shown in the basic MySQL tutorial. The example shown here works for literals, but prepared statement parameters are also available, and can be changed.

If you only have to deal with a handful of queries, this works well. If you have a large number of different queries to match, it may make sense to store them in a database for easier management. In that case, you can use any database that is convenient, not just MySQL (see Query Control Made Easy, or this example with SQL Server).


Beyond simple substitution

Because the filter executes your JavaScript code, you can do a lot more than just substitute one query for another. You can, for instance, have conditional code based on who the current user is, what their IP address is, the time of day, the request itself, the values of the parameters, and whatever else makes sense in your context.

The values of the parameters themselves can be modified in the filter, which can be very useful if the original request is not directly compatible with its replacement. For instance, you may need to combine two parameters, or split and reformat a parameter.

Advantages of the Rewriter plugin

Rewriter

  • Built in - usually available in on-site installations

  • Executed after parsing

Gallium Data

  • Requires additional system

  • Statements must be matched with regular expressions

Advantages of Gallium Data

Gallium Data

  • Can work with cloud databases

  • Works with all request types

  • Can use extended regular expressions

  • Can be for selected requests

  • Parameter values can be read and modified

  • Runs outside of database

Rewriter

  • Not available in most cloud databases

  • Works only with some request types

  • Exact match only

  • All matching requests will be rewritten

  • Applied regardless of parameter values

  • Creates slight load on the database

Conclusion

These two methods of query rewriting have some overlap, but their different approaches mean that they apply to different use cases.

The Rewriter plugin is usually available out of the box, and that's a significant advantage. Not having to install another system is a big plus. If the Rewriter plugin does everything you need, and you don't anticipate needing more, then it's most likely the better solution.

Gallium Data is a more powerful solution, but it does require an extra system to be up and running. In the cloud, that's actually a big plus, since the plugin is usually not an option. Because Gallium Data is based on code rather than a lookup table, it can do a lot more than simple one-for-one query substitution, and can therefore handle a much broader set of scenarios.