Ruby: Converting UTF-8 codepoints to characters

Converting string characters to or from their integer value (7-bit ASCII value or UTF-8 codepoint) can be done in different ways in Ruby:

  • String#ord or String#unpack to get character values
  • Integer#chr or Array#pack to convert character values into Strings

Character values to Strings

Integer#chr

To get the character for a 7-bit ASCII value or UTF-8 codepoint (0-127) you can use Integer#chr:

116.chr
# => "t"

To get a character for values larger than 127, you need to pass the encoding. E.g. to get codepoint 252 in UTF-8:

252.chr(Encoding::UTF_8)
# => "ü"

Array#pack

pack may feel less intuitive, but does not require passing an encoding option. You need the U* directive.

[116].pack('U*')
# => "t"
[252].pack('U*')
# => "ü"

Note that you must wrap your value numbers into an Array. In turn, this allows constructing Strings from multiple values easily:

[116, 252, 114, 32, 9786].pack('U*')
# => "tür ☺"

Note that the asterisk (*) is required for strings longer than 1 character.


Strings to character values

String#ord

To convert back from a String to its codepoint, use String#ord:

"t".ord
# => 116

String#unpack

Strings offer unpack as an inverse to Array#pack. Codepoints will be returned as arrays, and you can convert entire strings:

"t".unpack('U*')
# => [106]
"tür ☺".unpack('U*')
# => [116, 252, 114, 32, 9786]

pack/unpack can also convert into many other values or encodings (like quoted-printable). Please see the pack documentation Show archive.org snapshot for more information.

Henning Koch Almost 8 years ago