Ruby: Replacing Unicode characters with a 7-bit transliteration

Updated . Posted . Visible to the public. Repeats.

Using ActiveSupport

ActiveSupport comes with a #transliterate method which replaces characters with their low-ASCII equivalent (to strip accents etc.:):

ActiveSupport::Inflector.transliterate('aäoöuü') # => "aaoouu"

You can also add custom rules in your I18n dictionary like this:

de:
  i18n:
    transliterate:
      rule:
        Ä: 'Ae'
        Ö: 'Oe'
        Ü: 'Ue'
        ä: 'ae'
        ö: 'oe'
        ü: 'ue'
        ß: 'ss'

With this you get:

ActiveSupport::Inflector.transliterate('aäoöuü') # => "aaeooeuue"

Using a Unicode-aware regexp

you can use the following code (as taken from the linked article):

string.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.to_s

Using Iconv

Using Iconv (with a //TRANSLIT charset) does not work reliably, since transliteration depends on the set locale, and Ruby's Iconv wrapper does not expose functionality to set it.

Henning Koch
Last edit
Michael Leimstädtner
Keywords
rails, unicode, letters, utf, strings, 7-bit
License
Source code in this card is licensed under the MIT License.
Posted by Henning Koch to makandra dev (2012-06-27 13:39)