Posted almost 9 years ago. Visible to the public. Linked content.

Regular Expressions - Cheat Sheet

You can write regular expressions some different ways, e.g. /regex/ and %r{regex}. For examples, look here.

Characters

Literal Characters
[ ] \ ^ $ . | ? * + ( )

Character Classes
[ae] matches a and e, e.g. gr[ae]y => grey or gray => but NOT graay or graey

Copy
[0-9] matches a SINGLE digit in the range from 0 to 9 [0-9a-fA-F] hexadecimal digit ^ negates character class, q[^x] matches qu in question, but NOT Iraq, since there is no character after the q for the negated character class to match

Shorthand Characters
\d matches a single character that is a digit
\w matches a word character (alphanumeric characters plus underscore)
\s matches white space character (includes tabs and line breaks)
\t matches tab character

Non-Printable Characters
\xFF matches hexadecimal character
\uFFFF matches unicode character, \u20AC matches €

Copy
. matches all, sometimes except line breaks [^\n] Unix, [^\r\n] Windows

Anchors

Copy
^ matches the start of a line $ matches the end of a line \A matches the start of a string \z matches the end of a string \b matches a word boundary. A word boundary is a position between a character that can be matched by \w and a character that cannot be matched by \w. also matches at the start and/or end of the string if the first and/or last characters in the string are word characters. \B matches at every position where \b cannot match.

Alternation

Copy
cat|dog will match cat in "About cats and dogs", if RegEx is applied again, it will match dog

Quantifiers

Copy
? none or one, e.g. colou?r matches colour or color * zero or more times + once or more times <[A-Za-z][A-Za-z0-9]*> matches an HTML tag without any attributes. <[A-Za-z0-9]+> is easier to write but matches invalid tags such as <1> Use curly braces to specify a specific amount of repetition. Use \b[1-9][0-9]{3}\b to match a number between 1000 and 9999. \b[1-9][0-9]{2,4}\b matches a number between 100 and 99999.

Modes: Greedy, Lazy and Possessive

Example string: This test is a <EM>first</EM> test string.

  • greedy: default. * and + match as much as they can and backtrack when they can't satisfy the regex, i.e. the .* in /.*test/ will first match the whole example string and then go back to match this: This test is a <EM>first</EM> test .

  • lazy (ungreedy): specified by adding a question mark to the qualifier. *? and +? match as little as possible, i.e. /.*?test/ will match This test.

  • possessive: specified by adding a plus sign to the qualifier. Reads like "greedy without backtracking" – *+ and ++ try to match everything but immediately return if it doesn't succeed, i.e. /\d++/ matches 333 whereas /\d++3/ does not. (A lazy /\d+?/ would only match 3.)
    Use it with caution. Mostly you'll want to use it for small expressions, e.g. for nested sub-regexes.

Look-around

Look-arounds provide a way to match context-dependant. You can look-behind, look-ahead and to both in a positive and negative way. The look-around will not be part of the match.

  • /foo(?=bar)/ matches the foo in the foo and the bar but not in this food is bad
  • /otto(?!normal)/ matches the otto in ottomotor but not in ottonormalverbraucher
  • /(?<=ma)kandra/ matches the kandra in makandra but not in kandra
  • /(?<!foo)bar/ matches the bar in moo bar but not in foobar

Modifiers in Ruby

Add modifiers after the final slash, e.g. /Regex/im, or at the beginning of the regex, e.g. /(?i)regex/.

  • i: case insensitivity
  • m: make . also match newlines. Know that this modifier does work in Ruby, but not JS or Perl.
  • o: evaluate string interpolation only once (e.g. /foo#{Counter.value}/)
  • x: ignore whitespace (and comments) inside the regex. Allows for definitions like this:

    Copy
    / < (3)+ # repeating part \ you # need to escape this space! /x

    Any whitespace you could have in regular regexes is eliminated before matching (/( ?= foo) bar/x is the same as /(?=foo)bar/). Hence to match spaces, you need to escape them.

    x has unexpected side effects: /foo +/x matches foo and foofoo, it seems to actually use /(?:foo)+/ for matching. Furthermore, /I sign in as ?/x matches What do you expect with a match result of "" (internal regex is /(?:Isigninas)?/). Obviously the engine eliminates whitespace from left to right and turns resulting substrings into unreferenced groups before applying quantifiers. (This is true for Ruby, could not check it for other languages.)

Once an application no longer requires constant development, it needs periodic maintenance for stable and secure operation. makandra offers monthly maintenance contracts that let you focus on your business while we make sure the lights stay on.

Owner of this card:

Avatar
Martin Straub
Last edit:
over 6 years ago
Keywords:
RegEx, regexp, multiple, lines, multi-line
About this deck:
We are makandra and do test-driven, agile Ruby on Rails software development.
License for source code
Posted by Martin Straub to makandra dev
This website uses cookies to improve usability and analyze traffic.
Accept or learn more