Data representation: ASCII, Unicode

1.1 Compare and contrast notational systems

📘CompTIA ITF+ (FC0-U61)


1. Introduction to Data Representation

Computers do not understand letters, words, or symbols the way humans do.
They only understand binary (0s and 1s).

So when you type text on a keyboard, your computer must convert every character (letters, digits, punctuation, symbols) into numbers.
These numbers are then stored and processed as binary.

To make this work globally, computers use character encoding standards.
For the ITF+ exam, you must know the two most important encoding systems:

  • ASCII
  • Unicode

2. ASCII (American Standard Code for Information Interchange)

What is ASCII?

ASCII is one of the earliest and simplest character encoding systems.
It assigns a unique number to each character used in English text.

Key points for the exam

  • ASCII uses 7 bits to represent a character.
  • It can represent 128 characters (0–127).
  • It includes:
    • Uppercase letters (A–Z)
    • Lowercase letters (a–z)
    • Digits (0–9)
    • Basic punctuation (!, ?, :, etc.)
    • Control characters (like newline, tab)

Common IT usage

ASCII is used where only basic English text is needed. Examples include:

  • System logs that store simple English messages
  • Older devices and network equipment that support only basic characters
  • Programming languages that use ASCII codes for certain functions

Limitations

  • Cannot represent non-English characters (e.g., accented letters, Asian languages)
  • Cannot represent modern emoji or many symbols used today

Because the world uses many languages, ASCII became too limited.
This led to the creation of Unicode.


3. Unicode

What is Unicode?

Unicode is a universal character encoding system designed to represent every character used worldwide.

It solves the limitations of ASCII.

Key points for the exam

  • Unicode supports over 140,000+ characters and continues to grow.
  • It can represent:
    • All major world languages (English, Arabic, Chinese, Bengali, Hindi, etc.)
    • Mathematical symbols
    • Technical symbols used in IT
    • Emoji
    • Special characters needed for global software and databases

Unicode Encodings

Unicode can be stored in different encoding formats, mainly:

  • UTF-8
  • UTF-16
  • UTF-32

The exam primarily expects you to know:

UTF-8

  • The most widely used Unicode encoding
  • Backward-compatible with ASCII
  • Uses 8-bit blocks (1 byte) for common characters and more bytes for others
  • Efficient for web pages, applications, and databases

UTF-16 / UTF-32

  • Used in systems that need to support many international characters
  • Uses more storage (16-bit or 32-bit blocks)

4. How ASCII and Unicode Work in IT Environments

Software and Applications

Modern applications—such as web browsers, operating systems, databases, messaging apps—use Unicode to ensure all text displays correctly regardless of language.

ASCII may still appear in:

  • Legacy systems
  • Scripts or logs that only use English text

Databases

  • ASCII fields store simple English text.
  • Unicode fields (UTF-8 or UTF-16) store multilingual text, usernames, comments, and emoji.

Networking

  • Device configurations and command-line interfaces often use ASCII to avoid complications with non-English characters.
  • Modern network management systems use Unicode to support global operations.

Websites

  • Almost all websites use UTF-8 as the character encoding.
  • It ensures that text from any language displays correctly.

5. Why ASCII and Unicode Matter for the ITF+ Exam

The exam expects you to:

✔ Understand what ASCII is

✔ Know that ASCII uses 7 bits (128 characters)

✔ Know that ASCII is limited to basic English characters

✔ Understand what Unicode is

✔ Know that Unicode supports many more characters

✔ Know that Unicode allows global language support

✔ Know that UTF-8 is the most common Unicode encoding

ASCII = small, older, limited
Unicode = modern, universal, flexible


6. Comparison Table (Exam-Friendly)

FeatureASCIIUnicode
PurposeRepresent basic English charactersRepresent all world languages + symbols
Bits used7 bitsVariable (8, 16, 32 bits depending on UTF type)
Characters supported128Over 140,000
Supports emoji?❌ No✔ Yes
Supports international languages?❌ No✔ Yes
Common usageLogs, legacy systemsWebsites, apps, databases, OS
CompatibilityBase for UTF-8Includes ASCII as part of UTF-8

7. Summary for Students

  • Computers store characters as numbers.
  • ASCII came first but is limited to 128 basic English characters.
  • Unicode is the modern standard that supports nearly every language and symbol.
  • UTF-8 is the most used Unicode encoding, especially on the internet.
  • Unicode ensures that applications can handle global text without errors.
Buy Me a Coffee