Posted 23 days ago. Visible to the public. Repeats.

Regular Expressions: Quantifier modes

When you repeat a subpattern with a *, + or {...} operator, you may choose between greedy, lazy and possessive modes.

Switching modes may affect the result and performance of your regular expressions. In the worst case, an ill-suited mode may make your regular expression so slow that it can DoS your application.

Greedy quantifiers (default)

A plain * or + is greedy. It will make the longest possible match for the subpattern it repeats. Only when the rest of the expression no longer matches, it gives up trailing parts of its match and tries again. This is called "backtracking".

Below we match the string "aaabbb" against an expression that greedily matches multiple arbitrary characters (.+) and then matches three bs.

Copy
"aaabbb" =~ /^(.+)(bbb)/

Because .+ greedily matches more characters than it should, this expression requires 4 attempts to match the string:

Attempt Group $1 Group $2 Expression result
1 "aaabbb" not matched not matched
2 "aaabb" not matched not matched
3 "aaab" not matched not matched
4 "aaa" "bbb" matched!

A greedy quantifier is more effective when it repeats a more distinctive subpattern than .. The expression /^(a+)(bbb)/ only requires a single attempt to match "aaabbb":

Copy
"aaabbb" =~ /^(a+)(bbb)/
Attempt Group $1 Group $2 Expression result
1 "aaa" "bbb" matched!

Because greedy quantifiers start backtracking when an attempt fails, it may take a greedy expression longer to find out that a string that does not match. Here we run the same expression against the non-matching string "aaabbc":

Copy
"aaabbx" =~ /^(a+)(bbb)/
Attempt Group $1 Group $2 Expression result
1 "aaa" not matched not matched
2 "aa" not matched not matched
3 "a" not matched not matched

Lazy quantifiers

A quantifier like *? or +? is lazy. It yields the same matches as a greedy quantifier, but takes another route there.

Lazy quantifiers will make the shortest possible match for the subpattern it repeats. Only when the rest of the expression no longer matches, it adds another repetition of its subpattern and tries again.

Below we match the string "aaabbb" against an expression that lazily matches multiple arbitrary characters (.+) and then matches three bs.

Copy
"aaabbb" =~ /^(.+?)(bbb)/

Because .+? never matches more characters than it should, this expression requires only 3 attempts to match the string:

Attempt Group $1 Group $2 Expression result
1 "a" not matched not matched
2 "aa" not matched not matched
3 "aaa" "bbb" matched!

A lazy quantifer will not be faster on a more distinctive subpattern (a+? instead of .+?):

Copy
"aaabbb" =~ /^(a+?)(bbb)/

Here we require 3 attempts to make a match. Remember that a greedy quanitifer only required a single attempt.

Attempt Group $1 Group $2 Expression result
1 "a" not matched not matched
2 "aa" not matched not matched
3 "aaa" "bbb" matched!

Because a lazy quantifier also backtracks, finding out that a string doesn't match also requires multiple attempts:

Copy
"aaabbx" =~ /^(a+)(bbb)/
Attempt Group $1 Group $2 Expression result
1 "a" not matched not matched
2 "aa" not matched not matched
3 "aaa" not matched not matched

Possessive quantifiers

A quantifier like .*+ or .++ is possessive. Like a greedy quantifiers, it makes the longest possible match for the subpattern it repeats. Unlike other operators it never backtracks.

Below we match the string "aaabbb" against a regular expressions that possessively matches many arbitrary characters, and then looks for three bs:

Copy
"aaabbb" =~ /^(.++)(bbb)/

Because the first part of the expression (.++) matches the entire string, the expression will never be able to match three additional bs at the end. Hence the expression doesn't match. Because there is no backtracking with possessive quantifiers, the expression fails after a single attempt.

Attempt Group $1 Group $2 Expression result
1 "aaabbb" not matched not matched

Possessive quantifiers can be very effective when the repeated subpattern is more distinctive:

Copy
"aaabbb" =~ /^(a++)(bbb)/
Attempt Group $1 Group $2 Expression result
1 "aaa" "bbb" matched

Possessive quanitifiers can also be very effective to detect that a string does not match:

Copy
"aaabbx" =~ /^(a++)(bbb)/
Attempt Group $1 Group $2 Expression result
1 "aaa" not matched not matched

makandra has been working exclusively with Ruby on Rails since 2007. Our laser focus on a single technology has made us a leader in this space.

Owner of this card:

Avatar
Henning Koch
Last edit:
19 days ago
by Tobias Kraze
Keywords:
repetition, operators, optional
About this deck:
We are makandra and do test-driven, agile Ruby on Rails software development.
License for source code
Posted by Henning Koch to makandra dev
This website uses short-lived cookies to improve usability.
Accept or learn more