Read more

Ruby: Converting UTF-8 codepoints to characters

Henning Koch
June 21, 2016Software engineer at makandra GmbH

Converting string characters to or from their integer value (7-bit ASCII value or UTF-8 codepoint) can be done in different ways in Ruby:

  • String#ord or String#unpack to get character values
  • Integer#chr or Array#pack to convert character values into Strings

Character values to Strings


Illustration money motivation

Opscomplete powered by makandra brand

Save money by migrating from AWS to our fully managed hosting in Germany.

  • Trusted by over 100 customers
  • Ready to use with Ruby, Node.js, PHP
  • Proactive management by operations experts
Read more

To get the character for a 7-bit ASCII value or UTF-8 codepoint (0-127) you can use Integer#chr:

# => "t"

To get a character for values larger than 127, you need to pass the encoding. E.g. to get codepoint 252 in UTF-8:

# => "ü"


pack may feel less intuitive, but does not require passing an encoding option. You need the U* directive.

# => "t"
# => "ü"

Note that you must wrap your value numbers into an Array. In turn, this allows constructing Strings from multiple values easily:

[116, 252, 114, 32, 9786].pack('U*')
# => "tür ☺"

Note that the asterisk (*) is required for strings longer than 1 character.

Strings to character values


To convert back from a String to its codepoint, use String#ord:

# => 116


Strings offer unpack as an inverse to Array#pack. Codepoints will be returned as arrays, and you can convert entire strings:

# => [106]
"tür ☺".unpack('U*')
# => [116, 252, 114, 32, 9786]

pack/unpack can also convert into many other values or encodings (like quoted-printable). Please see the pack documentation Show snapshot for more information.

Posted by Henning Koch to makandra dev (2016-06-21 12:04)