Of all the invisible characters, U+202F NARROW NO-BREAK SPACE might be the most successful impostor. It is not zero-width; it renders as a slightly slimmer gap than a regular space. Side by side in a paragraph, you will never tell them apart. To software, they could not be more different: 0x20 versus a three-byte UTF-8 sequence, e2 80 af.
249 USD == 249 USD
in the bytes2490x20USD != 249U+202FUSD
A legitimate character with a real job
U+202F is correct, professional typography in several languages. French orthography calls for a narrow non-breaking space before tall punctuation: Vous venez ? is properly set with one before the question mark, so the mark never wraps to its own line. German number formats, Russian typography and scientific notation (a thin gap between a number and its unit) all use it. Any text that passed through serious typesetting - academic publishing, quality newspapers - is full of them, on purpose.
This is exactly why AI models picked the character up. Their training data included mountains of professionally typeset text, and in 2025, users started finding U+202F scattered through ChatGPT output where plain spaces belonged - which fueled a watermarking theory we examined in Does ChatGPT watermark its text? The mundane explanation held up: models reproduce the typography of their sources. The character even made headlines again when a GPT-5 variant emitted so many of them that macOS apps hit text-rendering glitches, and developers filed it as a bug.
What it breaks
Everything that compares strings byte-for-byte:
"249 USD"with a narrow no-break space does not equal"249 USD"with a plain one. Spreadsheet lookups, database joins and dedupe passes fail invisibly.- Search fails: Ctrl+F for
9:41 AMwill not find9:41 AMtyped with U+202F before the AM. - Code and configs: a U+202F inside a shell command or YAML file is a syntax error dressed as a space. The error message will point at a line that looks perfect.
- CSV parsing, regex
\sassumptions (some engines match it, some contexts don't), fixed-width formats: all of it wobbles.
Its cousin U+00A0, the ordinary non-breaking space, causes the same breakage and is even more common: every on the web becomes one when you copy.
The right way to handle it
Blanket destruction is wrong: a French user's punctuation spacing is not junk, and a cleaner that flattens it corrupts correct writing. The right policy is locale-aware normalization: replace space impostors with plain spaces where they are noise, keep them where the language genuinely uses them.
That policy is built into CopyClean: space variants are normalized to a regular space on copy, with per-language preservation rules (French, German, Russian and Mongolian keep their narrow no-break spaces, CJK keeps the ideographic space, and so on). If you just need to check one suspicious string right now, here are five free ways to see hidden characters - U+202F shows up as e2 80 af in a hex dump.
A space should be a space. When it isn't, you deserve to know, and your clipboard can handle it for you.