Posted over 4 years ago. Visible to the public.

Ruby: Converting UTF-8 codepoints to characters

Converting string characters to or from their integer value (7-bit ASCII value or UTF-8 codepoint) can be done in different ways in Ruby:

  • String#ord or String#unpack to get character values
  • Integer#chr or Array#pack to convert character values into Strings

Character values to Strings

Integer#chr

To get the character for a 7-bit ASCII value or UTF-8 codepoint (0-127) you can use Integer#chr:

Copy
116.chr # => "t"

To get a character for values larger than 127, you need to pass the encoding. E.g. to get codepoint 252 in UTF-8:

Copy
252.chr(Encoding::UTF_8) # => "ü"

Array#pack

pack may feel less intuitive, but does not require passing an encoding option. You need the U* directive.

Copy
[116].pack('U*') # => "t" [252].pack('U*') # => "ü"

Note that you must wrap your value numbers into an Array. In turn, this allows constructing Strings from multiple values easily:

Copy
[116, 252, 114, 32, 9786].pack('U*') # => "tür ☺"

Note that the asterisk (*) is required for strings longer than 1 character.


Strings to character values

String#ord

To convert back from a String to its codepoint, use String#ord:

Copy
"t".ord # => 116

String#unpack

Strings offer unpack as an inverse to Array#pack. Codepoints will be returned as arrays, and you can convert entire strings:

Copy
"t".unpack('U*') # => [106]
Copy
"tür ☺".unpack('U*') # => [116, 252, 114, 32, 9786]

pack/unpack can also convert into many other values or encodings (like quoted-printable). Please see the pack documentation for more information.

By refactoring problematic code and creating automated tests, makandra can vastly improve the maintainability of your Rails application.

Owner of this card:

Avatar
Henning Koch
Last edit:
over 4 years ago
by Arne Hartherz
About this deck:
We are makandra and do test-driven, agile Ruby on Rails software development.
License for source code
Posted by Henning Koch to makandra dev
This website uses short-lived cookies to improve usability.
Accept or learn more