RegEx Characters

 

Regular expressions consist of alphanumeric characters and a number of syntax elements which are considered non-alphanumeric.

Here you can find most of RegEx syntax elements which can be used in regular expressions introduced into Querix 4gl programs.

 

Modifiers:

 

These modifiers serve as possible flags of a regular expression and can be used to enhance search.

 

m

treats a string as a number of multiple lines:

/(\w+)/m

s

treats a string as a single line:

/(\w+)/s

i

makes the search pattern case-insensitive:

/(\w+)/i -- matches variants "word", "Word", "WORD", "wOrD", etc. regardless of their case

x

allows the pattern to include whitespaces and comments:

(\w+)  (\w+)/x -- matches "runtime" but not "run time".

(\d+)#(\w+)/x -- considers all the characters after # as a comment, matches "7" but not "word"

 

Special character classes:

 

[]

matches any of the characters in the sequence = creates a set (=class) of characters:

/[ab]c/ -- matches "ac" or "bc" but not "abc"

/[ab]+/ -- matches any non-empty string of a's and b's like "ab", "aabbab", "babbaabbbaaa", etc.

/([ab]+)c/ -- matches any non-empty string of a's and b's followed by c like "abc", "aabbabc", "babbaabbbaaac", etc.

[x-y]

matches any of the characters from x to y inclusively in ASCII:

/[a-d]/ -- matches every combination of a's, b's, c's, and d's but not e's, f's, etc. like "a", "abcd", "acbddd"

[\-]

matches the hyphen (-) character:

/[\-]/ -- matches "-" in "fifty-fifty"

[\n]

matches the newline character

[^smth]

matches any characters except those preceded by ^:

/[^c]/ -- matches any combination of alphanumeric characters except for c like "hdfj" but not "hdcfj"

 

Metacharacters:

 

^

beginning of the string

with /m means the beginning of a new line

$

end of the string

with /m means the end of the line

.

any character except the newline one

|

possible alternative:

/(a*)|(b*)/ -- matches "a", "aa", "aaa", etc., "b", "bb", "bb", etc. but not "ab", "aabb"

()

allows a part of a regular expression to be treated as a single unit:

/(ab)c/ -- matches "abc"

/(a|b)c/ -- matches "ac" or "bc" but not "abc"

/(\d\d):(\d\d):(\d\d)/ -- matches time values given in the hh:mm:ss format

\

quotes the following metacharacter:

/\|\/ -- matches "|"

 

Quantifiers:

 

*

matches 0 or more times (={0,}):

/(a*)/ -- matches "", "a", "aa", "aaa", etc.

+

matches 1 or more times (={1,}):

/(a+)/ -- matches "a", "aa", "aaa", etc. but not ""

/(a++a)/ -- never matches "aaaa" as a+ will take all a's a leave nothing for the remaining part of the pattern

?

matches 0 or 1 time or the shortest match (={0,1}):

/(a?)/ -- matches "", "a", etc. but not "aaa"

{n}

repetition = matches exactly n times:

/(a{5})/ -- matches "aaaaa" but not "aa", "aaa", "aaaaaaaaaa" etc.

{n,}

matches at least n times:

/(a{5,})/ -- matches "aaaaa"and "aaaaaaaaaa" but not "aa", "aaa", etc.

{,m}

matches no more m times:

/(a{,5})/ -- matches "aa", "aaa", and "aaaaa" but not "aaaaaaaaaa"

{n,m}

matches at least n but no more n times:

/(a{2,5})/ -- matches "aa", "aaa", "aaaa", and "aaaaa" but not "a" or "aaaaaaaaaa"

 

Special notations with \:

 

\w

matches any word characters

word characters include all alphanumeric characters + _ (the underscore character) + other connector punctuation chars + Unicode marks

\W

matches non-"word" characters

\s

matches a whitespace character

\S

matches any non-whitespace characters

\d

matches decimal digits (0-9)

\D

matches non-digits

\t

creates a tab character

\n

creates a newline character

\N

matches any characters but "\n"

 

Assertions:

 

\b

matches word boundaries

word boundary is a spot between two characters that has word characters on both sides, including the imaginary characters off the beginning and end of the string as matching a \W . Within character classes, \b represents backspace rather than a word boundary, just as it normally does in any double-quoted string.

For greater details, refer to PERL documentation here.

\B

matches any characters except word boundaries

\A

matches only the beginning of the string

\z

matches only the end of the string

\Z

matches only the end of the string or before a newline character (for a multi-line search)

 

 

Related articles:

RegExp algorithms