Read more

Matching line feeds with regular expressions works differently in every language

Henning Koch
May 12, 2011Software engineer at makandra GmbH

Although regular expression syntax is 99% interchangeable between languages, keep this in mind:

  • By default, the dot character (".") does not match a line feed (newline, line break, "\n") in any language.
  • Some languages allow you to modify the behavior of a regular expression by appending a modifier to the pattern expression. E.g. /foo/i makes the pattern case-insensitive in many languages. Note however that some of these modifiers may not exist or mean entirely different things in different languages.
  • Some languages have a modifier that makes the dot match line feeds.
Illustration web development

Do you need DevOps-experts?

Your development team has a full backlog? No time for infrastructure architecture? Our DevOps team is ready to support you!

  • We build reliable cloud solutions with Infrastructure as code
  • We are experts in security, Linux and databases
  • We support your dev team to perform
Read more Show archive.org snapshot

Here is how to make the dot match line feeds in various languages:

Ruby

You can make the dot match line feeds by using the /m modifier. If you come from other languages, do not use the /s modifier in Ruby. It changes the RegExp to interpret text as Shift JIS encoded Show archive.org snapshot which you probably don't want.

Javascript

There is no modifier to make the dot match line feeds. You need to write [\s\S]or (.|\s) to match any character including line feeds, carriage returns, etc.

While there is a /m modifier in Javascript, it only changes the meaning of ^ and $.

Perl

You can make the dot match line feeds by using the /s modifier. Note that there is also /m, which does something very different in Perl than in Ruby.

Posted by Henning Koch to makandra dev (2011-05-12 11:03)