How Big Companies Handle Unicode (like Google, Apple, etc.)

Apr 27, 2025

When companies operate at huge scale (billions of users, thousands of languages), they can't afford Unicode bugs like Spotify had.
So, they have a simple but super powerful principle:

"Normalize early, normalize once."

Whenever any user input comes into the system (typing, uploading, submitting forms), they do two things immediately:

  1. Normalize
normalized_input = unicodedata.normalize('NFC', user_input)

Even if the user typed with an old keyboard, copy-pasted weird text, or sent strange accents — it gets cleaned instantly.

  1. Validate

They then validate the input against rules:

  • Length checks
  • Allowed characters
  • Forbidden characters (like control codes)
  • Emoji handling (optional)

Only after that, they let it into the database, storage, or API.

Meet Kavathiya