Posted about 2 years ago. Visible to the public. Repeats.

Matching unicode characters in a Ruby (1.9+) regexp

On Ruby 1.9+, standard ruby character classes like \w, \d will only match 7-Bit ASCII characters:

Copy
"foo" =~ /\w+/ # matches "foo" "füü" =~ /\w+/ # matches "f", ü is not 7-Bit ASCII

There is a collection of character classes that will match unicode characters. From the documentation:

  • /[[:alnum:]]/ Alphabetic and numeric character
  • /[[:alpha:]]/ Alphabetic character
  • /[[:blank:]]/ Space or tab
  • /[[:cntrl:]]/ Control character
  • /[[:digit:]]/ Digit
  • /[[:graph:]]/ Non-blank character (excludes spaces, control characters, and similar)
  • /[[:lower:]]/ Lowercase alphabetical character
  • /[[:print:]]/ Like [[:graph:]], but includes the space character
  • /[[:punct:]]/ Punctuation character
  • /[[:space:]]/ Whitespace character ([[:blank:]], newline, carriage return, etc.)
  • /[[:upper:]]/ Uppercase alphabetical
  • /[[:xdigit:]]/ Digit allowed in a hexadecimal number (i.e., 0-9a-fA-F)
  • /[[:word:]]/ A character in one of the following Unicode general categories Letter, Mark, Number, Connector_Punctuation
  • /[[:ascii:]]/ A character in the ASCII character set

So:

Copy
"füü" =~ /[[:alpha:]]+/ # matches "füü" "‎१४" =~ /[[:digit:]]+/ # matches "१४"

Does your version of Ruby on Rails still receive security updates?
Rails LTS provides security patches for old versions of Ruby on Rails (3.2 and 2.3).

Author of this card:

Avatar
Tobias Kraze
Last edit:
8 months ago
by Arne Hartherz
About this deck:
We are makandra and do test-driven, agile Ruby on Rails software development.
License for source code
Posted by Tobias Kraze to makandropedia