Unicode

/ˈjuːnɪˌkoʊd/

noun — "a universal standard for encoding, representing, and handling text."

Unicode is a computing industry standard designed to provide a consistent and unambiguous way to encode, represent, and manipulate text from virtually all writing systems in use today. It assigns a unique code point — a numeric value — to every character, symbol, emoji, or diacritical mark, enabling computers and software to interchange text across different platforms, languages, and devices without loss of meaning or corruption.

Technically, a Unicode code point is expressed as a number in the range U+0000 to U+10FFFF, covering over 1,114,112 possible characters. Each character may be stored in various encoding forms such as UTF-8, UTF-16, or UTF-32, which define how the code points are translated into sequences of bytes for storage or transmission. UTF-8, for example, is variable-length and backward-compatible with ASCII, making it highly prevalent in web applications.


# example: representing "Hello" in Unicode UTF-8
# Python illustration
text = "Hello"
utf8_bytes = text.encode('utf-8')
# utf8_bytes = b'\x48\x65\x6c\x6c\x6f'
# each character maps to a Unicode code point
# H = U+0048, e = U+0065, l = U+006C, o = U+006F

Unicode solves a historical problem: before its adoption, multiple incompatible character encodings existed for different languages and regions. This caused text corruption when moving data between systems that used different standards. Unicode provides a single, unified framework to avoid these conflicts, enabling multilingual computing and internationalization.

Beyond basic letters and numbers, Unicode includes:

Diacritical marks and accents for precise linguistic representation.
Symbols and punctuation used in mathematics, currency, and technical writing.
Emoji and graphic symbols widely used in modern digital communication.
Control characters for formatting, directionality, and specialized operations.

Conceptually, Unicode acts as a global map for text in computing, where each character has a unique, platform-independent location. Software, operating systems, and protocols reference these code points to ensure consistent rendering, searching, sorting, and data exchange. Its design supports not only contemporary languages but also historical scripts and even symbolic or artistic systems.

In practice, Unicode enables interoperability between applications, databases, web pages, and communication protocols. Without Unicode, sending text across different regions or software systems could result in unreadable or corrupted data. Its adoption underpins modern computing, from operating systems like NTFS to web technologies, programming languages, and mobile devices.

See ASCII, UTF-8, UTF-16, Code, Character Encoding.

Standards

Character

Encoding