ASCII vs Unicode

ASCIIcode vs Unicode

Welcome friends!
This blog is about ASCII Code and Unicode. I will explain everything in easy words with examples, so you can understand without any confusion.

ASCII CODE

FULL FORM: American Standar Code for Information interchange.

Size: 7-bit(0-127)

Character set:A-Z,a-z,0-9,special sysmbols

Example:

A = 65 = Binary = 01000001

a = 97 = Binary = 01100001

0 = 48 = Binary = 00110000

Limitation = Works only for English. Cannot show Hindi,Chinese,Emoji,etc.

Extended ASCII

8 bit = 256 character(0-255)

Added some extra symbols like = ñ, ç

still not enough for all world languages.

3 Unicode

FULL FORM: Universal Character Encoding Standard

A universal standerd for all characters of all languages + emojis

Range:U+0000 to U+10FFFF(~1.1 million Possible characters)

Every character has a unique code point.

example:

   A = U+0041 (Decimal 65)

   अ = U+0905 (Decimal 2309)

   一 = U+4E00 (Chinese "one")

   🙂 = U+1F642 (Emoji)

4. Unicode Encodings (How code points are stored in memory)

UTF-8 (Most common , Internet standard )

Variable Length: 1-4Bytes

Backward Compatible with ASCII

Small for English, bigger for complex scripts

Example:

A = 41 = 01000001 (1 byte)

अ = E0 A4 85 = 11100000 10100100 10000101 (3 bytes)

🙂 = F0 9F 99 82 = (4 bytes)

UTF-16 (Windows and Java's favorite)

variable length: 2 or 4 bytes

Many characters use 2 bytes, emojis need 4 bytes

Example:

A = 0041 = 00000000 01000001 (2 bytes)

अ = 0905 = 00001001 00000101 (2 bytes)

🙂 = D83D DE42 = 4 bytes (surrogate pair)

UTF-32 (Simple but heavy)

Fixed length: 4 bytes for every character

Easy to calculation Positions, but wastes memory

Example:

A = 00000041 = 4 bytes

अ = 00000905 = 4 bytes

🙂 = 0001F642 = 4 bytes

How Computer Understands Characters(Flow)

1. You type on keyboard = 🙂

2. OS finds Unicode code point = U+1F642

3. Encoding converts it into bytes = UTF-8: F0 9F 99 82

4. Binary stored in memory = 11110000 10011111 10011001 10000010

5. Font file (TTF/OTF) says: "This is the shape of 🙂"

I will Explain to you in simple language What is TTF and What is OTF.

6. Display system (GPU) draws pixels = Emoji appears on screen

Simple undetstanding for ASCIICode and Uniode

ASCII = Small house (Only for English language)

Unicode= one type of Shoping mall ( it's support multiple languages + Emojies + characters)

UTF8 = INTERNET KING Because it supports all languages , emojies, character, symbol world wide

UTF16 = Windows/Java Loves

Why Windows loves UTF-16

In the 1990s, people thought: “We only need 65,536 characters (16 bits). That’s enough for all languages!”

So Windows NT (1993) made UTF-16 its native encoding.

Later, Unicode grew bigger (emoji, rare scripts). Now UTF-16 sometimes needs 2 code units (4 bytes) for one character.

But by then, all Windows APIs were built on UTF-16 = they can’t change without breaking old software.

Why Java loves UTF-16

Java started in 1995. Same thinking: 16 bits is enough.

They made the char type = 16-bit (UTF-16 code unit).

Unicode grew, so now sometimes one char = half of a real character (needs a surrogate pair).

But the whole Java ecosystem already depends on UTF-16 = too late to switch.

Why not UTF-8 back then?

In the 90s, UTF-8 wasn’t popular.

People thought UTF-16 was more efficient for Asian scripts (Hindi, Chinese, Japanese), because each character fit directly in 2 bytes.

So Windows + Java went with UTF-16.

Today’s reality

Most of the world (Linux, web, Python, Rust, Go) = UTF-8

Windows + Java = still UTF-16 (because of old code and backward compatibility).

Basically: UTF-16 is their old love, they can’t leave it now

So:

UTF-32 = direct but waste full

UTF-16 = Windows & Java’s old choice

UTF-8 = modern king of the world

UTF32= Direct but west full

Advantages of UTF-32

1. Super simple: one code point = one 32-bit number. No tricky rules.

2. Easy for programmers: random access is fast (indexing characters).

Example: string[5] = just go to (5 × 4 bytes).

Disadvantages of UTF-32

1. Huge memory waste

English text: "Hello" in UTF-32 = 20 bytes

In UTF-8 = only 5 bytes

2. Most real-world text is English/ASCII-heavy = UTF-32 wastes 3× space.

3. More storage = more RAM + slower for network transfer.

Where is UTF-32 used?

Rare in normal files or web.

Used internally in some programming languages or libraries where simplicity matters more than memory.

Example: Some C libraries, or ICU (Unicode library) for easy indexing.

Comparison Example:

UTF-32 is always 4 bytes per character, no matter what.

Summary:

UTF-8 = compact, flexible = best for storage/web.

UTF-16 = middle ground = Windows & Java legacy.

UTF-32 = super simple but memory-hungry = only special cases.

Thank you for reading!
I hope now you understand ASCII Code and Unicode better. Keep visiting for more simple tech blogs and keep learning!

Thank you for reading!
— Writer Kishan

Chat

Search This Blog

[Technical-Talk And Practical]

Editors' Spotlight

PING

ASCII vs Unicode

ASCIIcode vs Unicode

Comments

Post a Comment

Popular posts from this blog

PING

CPU (Central Processing Unit): Working, Types and Architecture