Read more

Regular Expressions: Quantifier modes

Henning Koch
February 11, 2021Software engineer at makandra GmbH

When you repeat a subpattern with a *, + or {...} operator, you may choose between greedy, lazy and possessive modes.

Illustration UI/UX Design

UI/UX Design by makandra brand

We make sure that your target audience has the best possible experience with your digital product. You get:

  • Design tailored to your audience
  • Proven processes customized to your needs
  • An expert team of experienced designers
Read more Show archive.org snapshot

Switching modes may affect the result and performance of your regular expressions. In the worst case, an ill-suited mode may make your regular expression so slow that it can DoS your application (Examples are the ActiveRecord's PostgreSQL CVE-2021-22880 Show archive.org snapshot or the Cloudflare outage 2019).

Greedy quantifiers (default)

A plain * or + is greedy. It will make the longest possible match for the subpattern it repeats. Only when the rest of the expression no longer matches, it gives up trailing parts of its match and tries again. This is called "backtracking".

Below we match the string "aaabbb" against an expression that greedily matches multiple arbitrary characters (.+) and then matches three bs.

"aaabbb" =~ /^(.+)(bbb)/

Because .+ greedily matches more characters than it should, this expression requires 4 attempts to match the string:

Attempt Group $1 Group $2 Expression result
1 "aaabbb" not matched not matched
2 "aaabb" not matched not matched
3 "aaab" not matched not matched
4 "aaa" "bbb" matched!

A greedy quantifier is more effective when it repeats a more distinctive subpattern than .. The expression /^(a+)(bbb)/ only requires a single attempt to match "aaabbb":

"aaabbb" =~ /^(a+)(bbb)/
Attempt Group $1 Group $2 Expression result
1 "aaa" "bbb" matched!

Because greedy quantifiers start backtracking when an attempt fails, it may take a greedy expression longer to find out that a string that does not match. Here we run the same expression against the non-matching string "aaabbx":

"aaabbx" =~ /^(a+)(bbb)/
Attempt Group $1 Group $2 Expression result
1 "aaa" not matched not matched
2 "aa" not matched not matched
3 "a" not matched not matched

Lazy quantifiers

A quantifier like *? or +? is lazy. It yields the same matches as a greedy quantifier, but takes another route there.

Lazy quantifiers will make the shortest possible match for the subpattern it repeats. Only when the rest of the expression no longer matches, it adds another repetition of its subpattern and tries again.

Below we match the string "aaabbb" against an expression that lazily matches multiple arbitrary characters (.+) and then matches three bs.

"aaabbb" =~ /^(.+?)(bbb)/

Because .+? never matches more characters than it should, this expression requires only 3 attempts to match the string:

Attempt Group $1 Group $2 Expression result
1 "a" not matched not matched
2 "aa" not matched not matched
3 "aaa" "bbb" matched!

A lazy quantifer will not be faster on a more distinctive subpattern (a+? instead of .+?):

"aaabbb" =~ /^(a+?)(bbb)/

Here we require 3 attempts to make a match. Remember that a greedy quanitifer only required a single attempt.

Attempt Group $1 Group $2 Expression result
1 "a" not matched not matched
2 "aa" not matched not matched
3 "aaa" "bbb" matched!

Because a lazy quantifier also backtracks, finding out that a string doesn't match also requires multiple attempts:

"aaabbx" =~ /^(a+)(bbb)/
Attempt Group $1 Group $2 Expression result
1 "a" not matched not matched
2 "aa" not matched not matched
3 "aaa" not matched not matched

Possessive quantifiers

A quantifier like .*+ or .++ is possessive. Like a greedy quantifiers, it makes the longest possible match for the subpattern it repeats. Unlike other operators it never backtracks.

Below we match the string "aaabbb" against a regular expressions that possessively matches many arbitrary characters, and then looks for three bs:

"aaabbb" =~ /^(.++)(bbb)/

Because the first part of the expression (.++) matches the entire string, the expression will never be able to match three additional bs at the end. Hence the expression doesn't match. Because there is no backtracking with possessive quantifiers, the expression fails after a single attempt.

Attempt Group $1 Group $2 Expression result
1 "aaabbb" not matched not matched

Possessive quantifiers can be very effective when the repeated subpattern is more distinctive:

"aaabbb" =~ /^(a++)(bbb)/
Attempt Group $1 Group $2 Expression result
1 "aaa" "bbb" matched

Possessive quanitifiers can also be very effective to detect that a string does not match:

"aaabbx" =~ /^(a++)(bbb)/
Attempt Group $1 Group $2 Expression result
1 "aaa" not matched not matched
Henning Koch
February 11, 2021Software engineer at makandra GmbH
Posted by Henning Koch to makandra dev (2021-02-11 13:14)