HTTP header values must only contain low-ASCII (7-bit) characters for safe transport. From RFC 7230 Show archive.org snapshot :
Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets.
If you need to transport 8-bit+ characters (e.g. Umlauts, Emojis), an option is to encode the string using JSON, and then escaping all code points above 127 using unicode escape sequences Show archive.org snapshot :
escapeHighASCII('{"foo":"xäy"}') // => '{"foo":"x\\u00e4y"}'
Escaping high ASCII in JavaScript
function escapeHighASCII(string) {
let unicodeEscape = (char) => "\\u" + char.charCodeAt(0).toString(16).padStart(4, '0')
return string.replace(/[^\x00-\x7F]/g, unicodeEscape)
}
Escaping high ASCII in Ruby
def escape_high_ascii(string)
string.gsub(/[[:^ascii:]]/) { |char| "\\u" + char.ord.to_s(16).rjust(4, "0") }
end
Posted by Henning Koch to makandra dev (2023-03-05 11:54)