Read more

HTTP headers can only transport US-ASCII characters safely

Henning Koch
March 05, 2023Software engineer at makandra GmbH

HTTP header values must only contain low-ASCII (7-bit) characters for safe transport. From RFC 7230 Show archive.org snapshot :

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets.

Illustration online protection

Rails Long Term Support

Rails LTS provides security patches for old versions of Ruby on Rails (2.3, 3.2, 4.2 and 5.2)

  • Prevents you from data breaches and liability risks
  • Upgrade at your own pace
  • Works with modern Rubies
Read more Show archive.org snapshot

If you need to transport 8-bit+ characters (e.g. Umlauts, Emojis), an option is to encode the string using JSON, and then escaping all code points above 127 using unicode escape sequences Show archive.org snapshot :

escapeHighASCII('{"foo":"xäy"}') // => '{"foo":"x\\u00e4y"}'

Escaping high ASCII in JavaScript

function escapeHighASCII(string) {
  let unicodeEscape = (char) => "\\u" + char.charCodeAt(0).toString(16).padStart(4, '0')
  return string.replace(/[^\x00-\x7F]/g, unicodeEscape)
}

Escaping high ASCII in Ruby

def escape_high_ascii(string)
  string.gsub(/[[:^ascii:]]/) { |char| "\\u" + char.ord.to_s(16).rjust(4, "0") }
end
Posted by Henning Koch to makandra dev (2023-03-05 12:54)