Read more

HTTP headers can only transport US-ASCII characters safely

Henning Koch
March 05, 2023Software engineer at makandra GmbH

HTTP header values must only contain low-ASCII (7-bit) characters for safe transport. From RFC 7230 Show archive.org snapshot :

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets.

Illustration web development

Do you need DevOps-experts?

Your development team has a full backlog? No time for infrastructure architecture? Our DevOps team is ready to support you!

  • We build reliable cloud solutions with Infrastructure as code
  • We are experts in security, Linux and databases
  • We support your dev team to perform
Read more Show archive.org snapshot

If you need to transport 8-bit+ characters (e.g. Umlauts, Emojis), an option is to encode the string using JSON, and then escaping all code points above 127 using unicode escape sequences Show archive.org snapshot :

escapeHighASCII('{"foo":"xäy"}') // => '{"foo":"x\\u00e4y"}'

Escaping high ASCII in JavaScript

function escapeHighASCII(string) {
  let unicodeEscape = (char) => "\\u" + char.charCodeAt(0).toString(16).padStart(4, '0')
  return string.replace(/[^\x00-\x7F]/g, unicodeEscape)
}

Escaping high ASCII in Ruby

def escape_high_ascii(string)
  string.gsub(/[[:^ascii:]]/) { |char| "\\u" + char.ord.to_s(16).rjust(4, "0") }
end
Posted by Henning Koch to makandra dev (2023-03-05 12:54)