Read more

Ruby: Replacing Unicode characters with a 7-bit transliteration

Henning Koch
June 27, 2012Software engineer at makandra GmbH

Using ActiveSupport

ActiveSupport comes with a #transliterate method which replaces characters with their low-ASCII equivalent (to strip accents etc.:):

ActiveSupport::Inflector.transliterate('aäoöuü') # => "aaoouu"
Illustration web development

Do you need DevOps-experts?

Your development team has a full backlog? No time for infrastructure architecture? Our DevOps team is ready to support you!

  • We build reliable cloud solutions with Infrastructure as code
  • We are experts in security, Linux and databases
  • We support your dev team to perform
Read more Show archive.org snapshot

You can also add custom rules in your I18n dictionary like this:

de:
  i18n:
    transliterate:
      rule:
        Ä: 'Ae'
        Ö: 'Oe'
        Ü: 'Ue'
        ä: 'ae'
        ö: 'oe'
        ü: 'ue'
        ß: 'ss'

With this you get:

ActiveSupport::Inflector.transliterate('aäoöuü') # => "aaeooeuue"

Using a Unicode-aware regexp

you can use the following code (as taken from the linked article):

string.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.to_s

Using Iconv

Using Iconv (with a //TRANSLIT charset) does not work reliably, since transliteration depends on the set locale, and Ruby's Iconv wrapper does not expose functionality to set it.

Posted by Henning Koch to makandra dev (2012-06-27 15:39)