Introduction to Regular Expression

2020, June 21

Regular expression or shortened as regex is a sequence of characters to define a search pattern.

Regex allows us to search for specific, standard textual syntax for representing patterns for matching text.

Though look complicated, regex are very powerful as it can be used to create and match any text pattern.

Usually, operations such as string searching algorithms and input validation solved by using regex.

Thus, regex is used in search engines and in search and replace dialog of text editors.


Table of contents


Metacharacter

Metacharacter is a character that has a special meaning to a regex.

Following are the common metacharacters in regex and description:


^ (Caret)

Matches the starting position within the string.

Example Caret


. (Dot)

Matches any single character. Can be used as wildcard character.

Example Dot


[ ]

Matches a single character that is contained within the bracket.

Example Bracket 1

- can be used to specific a range of characters. Thus, [a-z] matches "a" until "z".

Example Bracket 2


[^ ]

Matches a single character that is not contained within the bracket.

Example Caret bracket 1

- can be used to specific a range of characters. Thus, [^a-z] matches any except "a" until "z".

Example Caret bracket 2


$ (Dollar)

Matches the ending position of the string.

Example Dollar


* (Asterisk)

A repeater where matches when the character preceding * matches 0 or more times.

Example Asterisk


+ (Plus)

A repeater where matches when the character preceding + matches at least one or more times.

Example Plus


? (Question mark)

Matches when the character preceding ? occurs 0 or 1 time only, making the character optional.

Example Question mark


{n}

Matches when the preceding character occurs n times.

Example Curly n


{m,n}

Matches when the preceding character occurs at least m and not more than n times.

Example Curly m n


| (Pipe)

Matches either the expression before or expression after the |.

Example Pipe


Example usage

Following are common usages of regex.

If the usage required to use metacharacter as literal character, we can used backslash \ to escape the character.


Email validation

^[a-z0-9_\.]+@[a-z]+\.[a-z]{2,3}$

Result Email validation


Phone validation

Regex is used to match phone validation used in Malaysia.

^\+6[0-9]{2,3}-[0-9]{7,8}$

Result Phone validation


References