Example: parsing SQL queries

Introduction

Sometimes you may need to do a deep parse of a SQL statement. There is an excellent Java library that can do that: it's called JSqlParser, and it works great in Gallium Data. It allows you to parse even very complex SQL, and gives you complete access to the syntax tree. You can even modify the statement and re-serialize it back to a text query.

It will support most SQL statement you throw at it, though it's never guaranteed to works on absolutely anything.

Installation

To add this library to your Gallium Data repository, go to the Libraries page and search for organization com.github.jsqlparser -- then pick the latest version (4.6 as of this writing). add it to your repository, and publish.

When you're done, the Libraries page should look something like:

Parsing SQL

Once the library is installed, you can use it in your filters. Here's an example in a query filter:

let CCJSqlParserUtil = Java.type("net.sf.jsqlparser.parser.CCJSqlParserUtil");

let statement = CCJSqlParserUtil.parse(context.packet.sql, 

parser => parser.withSquareBracketQuotation(true));

You'll need to figure out which features you want -- the second part of the second line selects the option to support bracket quotes for SQL Server (e.g. [dbo].[MyTable] ). There are other options such as timeouts, allowing unsupported statements, etc...


You'll need to familiarize yourself with the API to go from there, but there is a nifty utility that allows you to get the list of all the tables involved in the command:

let TablesNamesFinder = Java.type("net.sf.jsqlparser.util.TablesNamesFinder");

let tnf = new TablesNamesFinder();

let tables = tnf.getTableList(statement);

This returns a List<String>. This works for selects, inserts, updates and deletes, and even for create table and so on.

Example

If we assume that our filter has received a SQL query (in this case for SQL Server):

SELECT t1.col1, t2.[col 2], t3."col3" || '-' || "col4" 

FROM [mydb].[Table1] AS t1, "mkt".[Table2] as t2

LEFT JOIN [sales].Table3 t3

ON t3.col7 = t2.col8

WHERE t1.col5 = t2.col6

In our filter (presumably a Query filter or perhaps a Prepared statement filter), we could then parse the query as shown above, and determine what tables are involved using TablesNamesFinder, then:

for (let t of tables) {

    log.info("Table used: " + t);

}

Output:

Table used: [mydb].[Table1]

Table used: "mkt".[Table 2]

Table used: [sales].Table3

If needed, we can modify the query programmatically. Let's say we know that the query is in the form:

SELECT name, age, dob FROM Customers WHERE comments = 'something' 

We could change the query using a simple search and replace:

context.packet.sql = context.packet.sql.replaceAll(/\bname\b/gi, "first_name"); 

However, this has the disadvantage that all instance of the string "name" will be replaced, so if we had a query like:

SELECT name, age, dob FROM Customers WHERE comments = 'This customer has an unusual name' 

the result would be:

SELECT first_name, age, dob FROM Customers WHERE comments = 'This customer has an unusual first_name' 

which is not what we wanted.

It would be more precise to use the SQL parser to change the query:

for (let c of statement.getSelectBody().getSelectItems()) {

    if ("name" === c.toString().toLowerCase()) {

        let Column = Java.type("net.sf.jsqlparser.schema.Column");

        c.setExpression(new Column("first_name"));

    }

}

log.info("Modified statement is: " + statement.toString());

context.packet.sql = statement.toString();

The resulting SQL will be, as desired:

SELECT first_name, age, dob FROM Customers WHERE comments = 'This customer has an unusual name' 

Conclusion

This is a simple example of using a Java library in Gallium Data. Most Java libraries can be used in Gallium Data, so long as they do not create conflicts with other libraries.