HTTP headers can only transport US-ASCII characters safely

Posted . Visible to the public. Repeats.

HTTP header values must only contain low-ASCII (7-bit) characters for safe transport. From RFC 7230 Show archive.org snapshot :

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets.

If you need to transport 8-bit+ characters (e.g. Umlauts, Emojis), an option is to encode the string using JSON, and then escaping all code points above 127 using unicode escape sequences Show archive.org snapshot :

escapeHighASCII('{"foo":"xäy"}') // => '{"foo":"x\\u00e4y"}'

Escaping high ASCII in JavaScript

function escapeHighASCII(string) {
  let unicodeEscape = (char) => "\\u" + char.charCodeAt(0).toString(16).padStart(4, '0')
  return string.replace(/[^\x00-\x7F]/g, unicodeEscape)
}

Escaping high ASCII in Ruby

def escape_high_ascii(string)
  string.gsub(/[[:^ascii:]]/) { |char| "\\u" + char.ord.to_s(16).rjust(4, "0") }
end
Henning Koch
Last edit
Henning Koch
License
Source code in this card is licensed under the MIT License.
Posted by Henning Koch to makandra dev (2023-03-05 11:54)