Sometimes you need to remove high Unicode characters from a string, so all characters have a code point Show archive.org snapshot between 0 and 127. The remaining 7-bit-encoded characters ("Low-ASCII") can be transported in most strings where escaping is impossible or would be visually jarrring.
Note that transliteration this will change the string. If you need to preserve the exact string content, you need to use escaping.
Using ActiveSupport
ActiveSupport comes with a #transliterate
method which replaces characters with their low-ASCII equivalent (to strip accents etc.:):
ActiveSupport::Inflector.transliterate('aäoöuü') # => "aaoouu"
You can also add custom rules in your I18n dictionary like this:
de:
i18n:
transliterate:
rule:
Ä: 'Ae'
Ö: 'Oe'
Ü: 'Ue'
ä: 'ae'
ö: 'oe'
ü: 'ue'
ß: 'ss'
With this you get:
ActiveSupport::Inflector.transliterate('aäoöuü') # => "aaeooeuue"
Using a Unicode-aware regexp
You can use the following code:
string.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.to_s
Using Iconv
Using Iconv (with a //TRANSLIT
charset) does not work reliably, since transliteration depends on the set locale, and Ruby's Iconv wrapper does not expose functionality to set it.