Regular expressions

Many filters accept parameters that can either be strings or regular expressions, often when they start with regex: for "loose" case-insensitive matching, or REGEX: for "strict" case-sensitive matching. More specifically, "loose" regular expressions are compiled with the following flags: CASE_INSENSITIVE, UNICODE_CASE, MULTILINE, DOTALL.


The full documentation for these regular expressions can be found in the Java documentation. This page is just a quick cheat sheet for the more common uses.

Characters

In general, a character corresponds to itself, i.e. Z matches the character Z. There are, however, some special characters:

  • \\ : the backslash character

  • \t : the tab character (Unicode 9)

  • \n : the newline character (Unicode 10)

  • \r : the line feed character (Unicode 13)

  • \xhh : the character with hexadecimal value hh

  • \uhhhh : the character with Unicode hexadecimal hhhh

  • [abc] : a, b or c

  • [^abc] : anything except a, b or c

  • [a-zA-Z] : all characters from a to z, and from A to Z, inclusive.

  • . : (period) any character. This will match line breaks in "loose" mode, but not in "strict" mode.

  • \d : a digit (equivalent to [0-9] )

  • \s : a space character, i.e. space, tab, new line, etc...

  • \b : a word boundary, e.g. space, punctuation, line break (in "loose" mode).


Examples

  • hello : will match "hello"

  • hello. : will match "hello!" or "hello!" or "helloA" or anything that is composed of "hello" plus one more character

  • h[io] : will match "hi" or "ho"

  • \u263a : will match (smiley face)

Quantifiers/groups

  • ? : one time or not at all

  • * : zero or more times

  • + : one or more times

  • {n} : exactly n times

  • {n,} : at least n times

  • {n, p} : between n and p times inclusive, p must be greater than n

  • (abcd) : matches "abcd" as a group, can be quantified

  • a|b : a or b


Examples

  • a+ : will match any string composed of one or more "a", such as "a", "aaaa" and "aaaaaaaaaaaaa"

  • .*hello.* : will match any string that contains "hello"

  • a{3,5} : will match "aaa", "aaaa" and "aaaaa"

  • (abc){2,4} : will match "abcabc", "abcabcabc" and "abcabcabcabc".


General examples

  • insert.*\bcustomers\b.* matches any string that starts with "insert", followed by any number of characters, followed by "customers" as a separate word (because of the word boundaries) followed by any number of characters.

  • .*(customers)|(cust_[0-9]+).* matches any string that contains either "customers", or "cust_" followed by one or more digit.

  • ^((?!((foo)|(bar)).)*$ matches any string that does not contain "foo" or "bar".

There are many more things regular expressions can do. If you are not familiar with them, they are well worth some study.

In filter parameters that accept comma-separated strings, you can use a comma in the regular expression if you prefix it with a backslash, e.g.:

regex:A\d{1\,3},B\d{1,3}