HTTP headers can only transport US-ASCII characters safely

Posted . Visible to the public. Repeats.

HTTP header values must only contain low-ASCII (7-bit) characters for safe transport. From RFC 7230 Show archive.org snapshot :

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets.

If you need to transport 8-bit+ characters (e.g. Umlauts, Emojis), an option is to encode the string using JSON, and then escaping all code points above 127 using unicode escape sequences Show archive.org snapshot :

escapeHighASCII('{"foo":"xäy"}') // => '{"foo":"x\\u00e4y"}'

Escaping high ASCII in JavaScript

function escapeHighASCII(string) {
  let unicodeEscape = (char) => "\\u" + char.charCodeAt(0).toString(16).padStart(4, '0')
  return string.replace(/[^\x00-\x7F]/g, unicodeEscape)
}

Escaping high ASCII in Ruby

def escape_high_ascii(string)
  string.gsub(/[[:^ascii:]]/) { |char| "\\u" + char.ord.to_s(16).rjust(4, "0") }
end
Profile picture of Henning Koch
Henning Koch
Last edit
Henning Koch
License
Source code in this card is licensed under the MIT License.
Posted by Henning Koch to makandra dev (2023-03-05 11:54)