You can write regular expressions some different ways, e.g. /regex/
and %r{regex}
. For examples, look here.
Remember that it is always a good idea to match a regex visually first.
Characters
Literal Characters
[ ] \ ^ $ . | ? * + ( )
Character Classes
[ae] matches a and e, e.g. gr[ae]y => grey or gray => but NOT graay or graey
[0-9] matches a SINGLE digit in the range from 0 to 9
[0-9a-fA-F] hexadecimal digit
^ negates character class, q[^x] matches qu in question, but NOT Iraq,
since there is no character after the q for the negated character class to match
Shorthand Characters
\d matches a single character that is a digit
\w matches a word character (alphanumeric characters plus underscore)
\s matches white space character (includes tabs and line breaks)
\t matches tab character
Non-Printable Characters
\xFF matches hexadecimal character
\uFFFF matches unicode character, \u20AC matches €
. matches all, sometimes except line breaks [^\n] Unix, [^\r\n] Windows
Anchors
^ matches the start of a line
$ matches the end of a line
\A matches the start of a string
\z matches the end of a string
\b matches a word boundary. A word boundary is a position
between a character that can be matched by \w and a character
that cannot be matched by \w.
also matches at the start and/or end of the string if the first
and/or last characters in the string are word characters.
\B matches at every position where \b cannot match.
Alternation
cat|dog will match cat in "About cats and dogs", if RegEx is applied again, it will match dog
Quantifiers
? none or one, e.g. colou?r matches colour or color
* zero or more times
+ once or more times
{n, m} use curly braces to specify a specific amount of repetition within range [n, m]
{n} exactly n times
Examples
-
<[A-Za-z][A-Za-z0-9]*>
matches an HTML tag without any attributes.<[A-Za-z0-9]+>
is easier to write but matches invalid tags such as<1>
. - Use
\b[1-9][0-9]{3}\b
to match a number between 1000 and 9999. - Use
\b[1-9][0-9]{2,4}\b
matches a number between 100 and 99999.
Modes: Greedy, Lazy and Possessive
Example string: This test is a <EM>first</EM> test string.
-
greedy (default):
*
and+
match as much as they can and backtrack when they can't satisfy the regex, i.e. the.*
in/.*test/
will first match the whole example string and then go back to match this:This test is a <EM>first</EM> test
. -
lazy (ungreedy): specified by adding a question mark to the qualifier.
*?
and+?
match as little as possible, i.e./.*?test/
will matchThis test
. -
possessive: specified by adding a plus sign to the qualifier. Reads like "greedy without backtracking" –
*+
and++
try to match everything but immediately return if it doesn't succeed, i.e./\d++/
matches333
whereas/\d++3/
does not. (A lazy/\d+?/
would only match3
.)
Use it with caution. Mostly you'll want to use it for small expressions, e.g. for nested sub-regexes.
For more details have a look at the card on quantifier modes.
Look-around
Look-arounds provide a way to match context-dependant. You can look-behind, look-ahead and to both in a positive and negative way. The look-around will not be part of the match.
-
Positive lookahead:
/foo(?=bar)/
matches thefoo
inthe foo and the bar
but not inthis food is bad
-
Negative lookahead:
/otto(?!normal)/
matches theotto
inottomotor
but not inottonormalverbraucher
-
Positive lookbehind:
/(?<=ma)kandra/
matches thekandra
inmakandra
but not inkandra
-
Negative lookbehind:
/(?<!foo)bar/
matches thebar
inmoo bar
but not infoobar
Modifiers in Ruby
Add modifiers after the final slash, e.g. /Regex/im
, or at the beginning of the regex, e.g. /(?i)regex/
.
-
i
: case insensitivity -
m
: make the.
-character also match newlines. Know that this modifier does work in Ruby, but not JS or Perl. -
o
: evaluate string interpolation only once (e.g./foo#{Counter.value}/
) -
x
: ignore whitespace (and comments) inside the regex. Allows for definitions like this:/ < (3)+ # repeating part \ you # need to escape this space! /x
Any whitespace you could have in regular regexes is eliminated before matching (
/( ?= foo) bar/x
is the same as/(?=foo)bar/
). Hence to match spaces, you need to escape them.x
has unexpected side effects:/foo +/x
matchesfoo
andfoofoo
, it seems to actually use/(?:foo)+/
for matching. Furthermore,/I sign in as ?/x
matchesWhat do you expect
with a match result of""
(internal regex is/(?:Isigninas)?/
). Obviously the engine eliminates whitespace from left to right and turns resulting substrings into unreferenced groups before applying quantifiers. (This is true for Ruby, could not check it for other languages.)