Posted about 4 years ago. Visible to the public. Repeats.

A common mistake in validations using regular expressions

You certainly use regular expressions for validating strings, e.g. e-mail addresses by saying

Copy
validate :email, :with => /.../

Such regular expressions often look something like the following: /^[\w+\-.]+@[a-z\d\-.]+\.[a-z]+$/i which perfectly matches as expected:

Copy
>> "foo.bar-ooops@bar.com".match /^[\w+\-.]+@[a-z\d\-.]+\.[a-z]+$/i => #<MatchData "foo.bar-ooops@bar.com">

… and does not match unwanted values:

Copy
?> "invalid email@invalid host.com".match /^[\w+\-.]+@[a-z\d\-.]+\.[a-z]+$/i => nil

I know that the expression is not sufficient to validate e-mail addresses according to RFC, it's just an example.

Problem

By using the expression above you only match (and validate) until the first line break. After the newline, anything is allowed:

Copy
?> "foo.bar-ooops@bar.com\n<script> This is bad... </script>".match /^[\w+\-.]+@[a-z\d\-.]+\.[a-z]+$/i => #<MatchData "foo.bar-ooops@bar.com">

Solution

Use \A to identify the start of the string to match and \z for the end in your validation expression:

Copy
?> "foo.bar-ooops@bar.com\n<script> This is bad... </script>".match /\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i => nil

See also Ruby regular expression start/end line vs. start/end string

Once an application no longer requires constant development, it needs periodic maintenance for stable and secure operation. makandra offers monthly maintenance contracts that let you focus on your business while we make sure the lights stay on.

Author of this card:

Avatar
Thomas Eisenbarth
Last edit:
5 months ago
by Pascal Schmid
About this deck:
We are makandra and do test-driven, agile Ruby on Rails software development.
License for source code
Posted by Thomas Eisenbarth to makandropedia