Regular expressions consist of alphanumeric characters and a number of syntax elements which are considered non-alphanumeric.
Here you can find most of RegEx syntax elements which can be used in regular expressions introduced into Querix 4gl programs.
Modifiers:
These modifiers serve as possible flags of a regular expression and can be used to enhance search.
m |
treats a string as a number of multiple lines: /(\w+)/m |
s |
treats a string as a single line: /(\w+)/s |
i |
makes the search pattern case-insensitive: /(\w+)/i -- matches variants "word", "Word", "WORD", "wOrD", etc. regardless of their case |
x |
allows the pattern to include whitespaces and comments: (\w+) (\w+)/x -- matches "runtime" but not "run time". (\d+)#(\w+)/x -- considers all the characters after # as a comment, matches "7" but not "word" |
Special character classes:
[] |
matches any of the characters in the sequence = creates a set (=class) of characters: /[ab]c/ -- matches "ac" or "bc" but not "abc" /[ab]+/ -- matches any non-empty string of a's and b's like "ab", "aabbab", "babbaabbbaaa", etc. /([ab]+)c/ -- matches any non-empty string of a's and b's followed by c like "abc", "aabbabc", "babbaabbbaaac", etc. |
[x-y] |
matches any of the characters from x to y inclusively in ASCII: /[a-d]/ -- matches every combination of a's, b's, c's, and d's but not e's, f's, etc. like "a", "abcd", "acbddd" |
[\-] |
matches the hyphen (-) character: /[\-]/ -- matches "-" in "fifty-fifty" |
[\n] |
matches the newline character |
[^smth] |
matches any characters except those preceded by ^: /[^c]/ -- matches any combination of alphanumeric characters except for c like "hdfj" but not "hdcfj" |
Metacharacters:
^ |
beginning of the string with /m means the beginning of a new line |
$ |
end of the string with /m means the end of the line |
. |
any character except the newline one |
| |
possible alternative: /(a*)|(b*)/ -- matches "a", "aa", "aaa", etc., "b", "bb", "bb", etc. but not "ab", "aabb" |
() |
allows a part of a regular expression to be treated as a single unit: /(ab)c/ -- matches "abc" /(a|b)c/ -- matches "ac" or "bc" but not "abc" /(\d\d):(\d\d):(\d\d)/ -- matches time values given in the hh:mm:ss format |
\ |
quotes the following metacharacter: /\|\/ -- matches "|" |
Quantifiers:
* |
matches 0 or more times (={0,}): /(a*)/ -- matches "", "a", "aa", "aaa", etc. |
+ |
matches 1 or more times (={1,}): /(a+)/ -- matches "a", "aa", "aaa", etc. but not "" /(a++a)/ -- never matches "aaaa" as a+ will take all a's a leave nothing for the remaining part of the pattern |
? |
matches 0 or 1 time or the shortest match (={0,1}): /(a?)/ -- matches "", "a", etc. but not "aaa" |
{n} |
repetition = matches exactly n times: /(a{5})/ -- matches "aaaaa" but not "aa", "aaa", "aaaaaaaaaa" etc. |
{n,} |
matches at least n times: /(a{5,})/ -- matches "aaaaa"and "aaaaaaaaaa" but not "aa", "aaa", etc. |
{,m} |
matches no more m times: /(a{,5})/ -- matches "aa", "aaa", and "aaaaa" but not "aaaaaaaaaa" |
{n,m} |
matches at least n but no more n times: /(a{2,5})/ -- matches "aa", "aaa", "aaaa", and "aaaaa" but not "a" or "aaaaaaaaaa" |
Special notations with \:
\w |
matches any word characters word characters include all alphanumeric characters + _ (the underscore character) + other connector punctuation chars + Unicode marks |
\W |
matches non-"word" characters |
\s |
matches a whitespace character |
\S |
matches any non-whitespace characters |
\d |
matches decimal digits (0-9) |
\D |
matches non-digits |
\t |
creates a tab character |
\n |
creates a newline character |
\N |
matches any characters but "\n" |
Assertions:
\b |
matches word boundaries word boundary is a spot between two characters that has word characters on both sides, including the imaginary characters off the beginning and end of the string as matching a \W . Within character classes, \b represents backspace rather than a word boundary, just as it normally does in any double-quoted string. For greater details, refer to PERL documentation here. |
\B |
matches any characters except word boundaries |
\A |
matches only the beginning of the string |
\z |
matches only the end of the string |
\Z |
matches only the end of the string or before a newline character (for a multi-line search) |
Related articles: