Matching line feeds with regular expressions works differently in every language

Updated . Posted . Visible to the public.

Although regular expression syntax is 99% interchangeable between languages, keep this in mind:

  • By default, the dot character (".") does not match a line feed (newline, line break, "\n") in any language.
  • Some languages allow you to modify the behavior of a regular expression by appending a modifier to the pattern expression. E.g. /foo/i makes the pattern case-insensitive in many languages. Note however that some of these modifiers may not exist or mean entirely different things in different languages.
  • Some languages have a modifier that makes the dot match line feeds.

Here is how to make the dot match line feeds in various languages:

Ruby

You can make the dot match line feeds by using the /m modifier. If you come from other languages, do not use the /s modifier in Ruby. It changes the RegExp to interpret text as Shift JIS encoded Show archive.org snapshot which you probably don't want.

Javascript

There is no modifier to make the dot match line feeds. You need to write [\s\S]or (.|\s) to match any character including line feeds, carriage returns, etc. There's a TC39 proposal Show archive.org snapshot that's stuck in Stage 1.

While there is a /m modifier in Javascript, it only changes the meaning of ^ and $.

Perl

You can make the dot match line feeds by using the /s modifier. Note that there is also /m, which does something very different in Perl than in Ruby.

Henning Koch
Last edit
Henning Koch
Keywords
multiline, multiple, lines, single
License
Source code in this card is licensed under the MIT License.
Posted by Henning Koch to makandra dev (2011-05-12 09:03)