Ruby: Replacing Unicode characters with a 7-bit transliteration

Posted Almost 12 years ago. Visible to the public. Repeats.

Using ActiveSupport

ActiveSupport comes with a #transliterate method which replaces characters with their low-ASCII equivalent (to strip accents etc.:):

ActiveSupport::Inflector.transliterate('aäoöuü') # => "aaoouu"

You can also add custom rules in your I18n dictionary like this:

de:
  i18n:
    transliterate:
      rule:
        Ä: 'Ae'
        Ö: 'Oe'
        Ü: 'Ue'
        ä: 'ae'
        ö: 'oe'
        ü: 'ue'
        ß: 'ss'

With this you get:

ActiveSupport::Inflector.transliterate('aäoöuü') # => "aaeooeuue"

Using a Unicode-aware regexp

you can use the following code (as taken from the linked article):

string.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.to_s

Using Iconv

Using Iconv (with a //TRANSLIT charset) does not work reliably, since transliteration depends on the set locale, and Ruby's Iconv wrapper does not expose functionality to set it.

Henning Koch
Last edit
Over 1 year ago
Michael Leimstädtner
Keywords
rails, unicode, letters, utf, strings, 7-bit
License
Source code in this card is licensed under the MIT License.
Posted by Henning Koch to makandra dev (2012-06-27 13:39)