Regular Expression

Ha Khanh Nguyen (hknguyen)


1. Basic Components

Anchors: ^ and $

Regex Functionality Matched Strings
^x matches any string that starts with x 'xoom', 'xx', 'x097', etc.
x$ matches any string that ends with x 'ox', 'xx', '098x', etc.
^x$ exact string match (has to be exactly 'x') 'x'
x matches any string that has x in it 'xoom', 'ox', 'xx', 'x097', '098x', '1x1', 'x', etc.

Quantifiers: * + ? and {}

Regex Functionality Matched Strings
abc* matches any string that has ab followed by 0 or more c 'abc', 'ab', 'abccc', 'aaaab', etc.
abc+ matches any string that has ab followed by 1 or more c 'abc', 'abccc', 'aabc', etc.
abc? matches any string that has ab followed by 0 or 1 c 'ab', 'abc', 'abbbbbbb', 'ababc', etc.
abc{2} matches any string that has ab followed by exactly 2 c 'abcc', 'aabcc', etc.
abc{2,} matches any string that has ab followed by 2 or more c 'abcc', 'aabccc', etc.
abc{2, 5} matches any string that has ab followed by 2 up to 5 c 'abcc', 'aabccccc', etc.
a(bc)* matches any string that has a followed by 0 or more copies of the sequence bc 'a', 'abc', 'aaabcbc', etc.
a(bc){2, 5} matches any string that has a followed by 2 up to 5 copies of the sequence bc 'abcbc', 'aaabcbcbc'

OR operation: | or []

Regex Functionality Matched Strings
a(b|c) matches any string that has a followed by b or c (and captures b or c) 'ab', 'ac', 'aaaabc', etc.
a[bc] same as above but without capturing b or c } 'ab', 'ac', 'aaaabc', etc.

Character classes: \d \w \s and .

Regex Functionality Matched Strings
\d matches a single character that is a digit '4', '2', '9', '0', etc.
\w matches a word character (alphanumeric character plus underscore) '_', 'h', '4', etc.
\s matches a whitespace character (includes tabs and line breaks) ' ', '\t', '\n', etc.
. matches any character ' ', '?', 'a', etc.

Escape character: \

Flags

Grouping and capturing: ()

Regex Functionality
a(bc) parentheses create a capturing group with value bc
a(?:bc)* using ?: we disable the capturing group
a(?<foo>bc) using ?<foo> we put a name to the group

Bracket expressions: []

Regex Functionality Matched Strings
[abc] matches any string that has either a or b or c (same as a|b|c) 'a', 'debt', 'wwwwc', etc.
[a-c] same as above 'a', 'debt', 'wwwwc', etc.
[a-zA-Z0-9] matches any string that represents a single digit or alphabet letter (not case-sensitive) 'a', '9', 'H', etc.
[^a-zA-Z] matches any string that has not a letter from a to z (not case sensitive) '7', '898', etc.

2. Putting It Together


This lecture notes reference materials from Johnny Fox's blog post Regex tutorial - A quick cheatsheet by examples on medium.com.