Ruby: Converting UTF-8 codepoints to characters
Converting string characters to or from their integer value (7-bit ASCII value or UTF-8 codepoint) can be done in different ways in Ruby:
String#unpackto get character values
Array#packto convert character values into Strings
Character values to Strings
To get the character for a 7-bit ASCII value or UTF-8 codepoint (0-127) you can use
116.chr # => "t"
To get a character for values larger than 127, you need to pass the encoding. E.g. to get codepoint 252 in UTF-8:
252.chr(Encoding::UTF_8) # => "ü"
pack may feel less intuitive, but does not require passing an encoding option. You need the
.pack('U*') # => "t" .pack('U*') # => "ü"
Note that you must wrap your value numbers into an Array. In turn, this allows constructing Strings from multiple values easily:
[116, 252, 114, 32, 9786].pack('U*') # => "tür ☺"
Note that the asterisk (
*) is required for strings longer than 1 character.
Strings to character values
To convert back from a String to its codepoint, use
"t".ord # => 116
unpack as an inverse to
Array#pack. Codepoints will be returned as arrays, and you can convert entire strings:
"t".unpack('U*') # => 
"tür ☺".unpack('U*') # => [116, 252, 114, 32, 9786]
unpack can also convert into many other values or encodings (like quoted-printable). Please see the
pack documentation for more information.