When companies operate at huge scale (billions of users, thousands of languages), they can't afford Unicode bugs like Spotify had.
So, they have a simple but super powerful principle:
"Normalize early, normalize once."
Whenever any user input comes into the system (typing, uploading, submitting forms), they do two things immediately:
- Normalize
normalized_input = unicodedata.normalize('NFC', user_input)
Even if the user typed with an old keyboard, copy-pasted weird text, or sent strange accents — it gets cleaned instantly.
- Validate
They then validate the input against rules:
- Length checks
- Allowed characters
- Forbidden characters (like control codes)
- Emoji handling (optional)
Only after that, they let it into the database, storage, or API.