Javascript: How to match text by Unicode properties

Posted . Visible to the public. Repeats.

The linked MDN article is quite informative of a neat feature supported by all major browsers Show archive.org snapshot : Unicode character class escape.

You can use it to write regular expressions that work on the full UTF-8 space, not just Latin/ASCII. For example, a password policy matcher might include regular expressions like [A-z] or [0-9], but those do not match e.g. German umlauts Show archive.org snapshot or Eastern Arabic Numerals Show archive.org snapshot . Those examples can easily be replaced with /\p{Letter}/u and \p{Number}. The expression /p supports various modifiers and shorthands.

Example password policy checker with Unicode character class escape

const password = 'Äö١!'

const upper = /\p{Lu}/u.test(password)
const lower = /\p{Ll}/u.test(password)
const digit = /\p{N}/u.test(password)
const symbol = /[^\p{Lu}\p{Ll}\p{N}]/u.test(password)

const matchedCategories = [upper, lower, digit, symbol].filter(Boolean).length // 4

See also

Profile picture of Michael Leimstädtner
Michael Leimstädtner
Last edit
Michael Leimstädtner
License
Source code in this card is licensed under the MIT License.
Posted by Michael Leimstädtner to makandra dev (2025-08-28 08:27)