When you repeat a subpattern with a *
, +
or {...}
operator, you may choose between greedy, lazy and possessive modes.
Switching modes may affect the result and performance of your regular expressions. In the worst case, an ill-suited mode may make your regular expression so slow that it can DoS your application (Examples are the ActiveRecord's PostgreSQL CVE-2021-22880 Show archive.org snapshot or the Cloudflare outage 2019).
Greedy quantifiers (default)
A plain *
or +
is greedy. It will make the longest possible match for the subpattern it repeats. Only when the rest of the expression no longer matches, it gives up trailing parts of its match and tries again. This is called "backtracking".
Below we match the string "aaabbb"
against an expression that greedily matches multiple arbitrary characters (.+
) and then matches three b
s.
"aaabbb" =~ /^(.+)(bbb)/
Because .+
greedily matches more characters than it should, this expression requires 4 attempts to match the string:
Attempt | Group $1 | Group $2 | Expression result |
---|---|---|---|
1 | "aaabbb" |
not matched | not matched |
2 | "aaabb" |
not matched | not matched |
3 | "aaab" |
not matched | not matched |
4 | "aaa" |
"bbb" |
matched! |
A greedy quantifier is more effective when it repeats a more distinctive subpattern than (.+)
. The expression /^(a+)(bbb)/
only requires a single attempt to match "aaabbb"
:
"aaabbb" =~ /^(a+)(bbb)/
Attempt | Group $1 | Group $2 | Expression result |
---|---|---|---|
1 | "aaa" |
"bbb" |
matched! |
Because greedy quantifiers start backtracking when an attempt fails, it may take a greedy expression longer to find out that a string that does not match. Here we run the same expression against the non-matching string "aaabbx"
:
"aaabbx" =~ /^(a+)(bbb)/
Attempt | Group $1 | Group $2 | Expression result |
---|---|---|---|
1 | "aaa" |
not matched | not matched |
2 | "aa" |
not matched | not matched |
3 | "a" |
not matched | not matched |
Lazy quantifiers
A quantifier like *?
or +?
is lazy. It yields the same matches as a greedy quantifier, but takes another route there.
Lazy quantifiers will make the shortest possible match for the subpattern it repeats. Only when the rest of the expression no longer matches, it adds another repetition of its subpattern and tries again.
Below we match the string "aaabbb"
against an expression that lazily matches multiple arbitrary characters (.+
) and then matches three b
s.
"aaabbb" =~ /^(.+?)(bbb)/
Because .+?
never matches more characters than it should, this expression requires only 3 attempts to match the string:
Attempt | Group $1 | Group $2 | Expression result |
---|---|---|---|
1 | "a" |
not matched | not matched |
2 | "aa" |
not matched | not matched |
3 | "aaa" |
"bbb" |
matched! |
A lazy quantifer will not be faster on a more distinctive subpattern (a+?
instead of .+?
):
"aaabbb" =~ /^(a+?)(bbb)/
Here we require 3 attempts to make a match. Remember that a greedy quanitifer only required a single attempt.
Attempt | Group $1 | Group $2 | Expression result |
---|---|---|---|
1 | "a" |
not matched | not matched |
2 | "aa" |
not matched | not matched |
3 | "aaa" |
"bbb" |
matched! |
Because a lazy quantifier also backtracks, finding out that a string doesn't match also requires multiple attempts:
"aaabbx" =~ /^(a+)(bbb)/
Attempt | Group $1 | Group $2 | Expression result |
---|---|---|---|
1 | "a" |
not matched | not matched |
2 | "aa" |
not matched | not matched |
3 | "aaa" |
not matched | not matched |
Possessive quantifiers
A quantifier like .*+
or .++
is possessive. Like a greedy quantifiers, it makes the longest possible match for the subpattern it repeats. Unlike other operators it never backtracks.
Below we match the string "aaabbb"
against a regular expressions that possessively matches many arbitrary characters, and then looks for three b
s:
"aaabbb" =~ /^(.++)(bbb)/
Because the first part of the expression (.++
) matches the entire string, the expression will never be able to match three additional b
s at the end. Hence the expression doesn't match. Because there is no backtracking with possessive quantifiers, the expression fails after a single attempt.
Attempt | Group $1 | Group $2 | Expression result |
---|---|---|---|
1 | "aaabbb" |
not matched | not matched |
Possessive quantifiers can be very effective when the repeated subpattern is more distinctive:
"aaabbb" =~ /^(a++)(bbb)/
Attempt | Group $1 | Group $2 | Expression result |
---|---|---|---|
1 | "aaa" |
"bbb" |
matched |
Possessive quanitifiers can also be very effective to detect that a string does not match:
"aaabbx" =~ /^(a++)(bbb)/
Attempt | Group $1 | Group $2 | Expression result |
---|---|---|---|
1 | "aaa" |
not matched | not matched |