What is Character Encoding?

“There are 10 types of people in this world, those who understand binary and those who don’t.”

It’s a nerd joke which should set the tone right for what is coming next – Character Encoding.

To understand what character encoding is, let’s dive little in the history. Once upon a time mankind realize the need to transmit a message in other than text format (maybe electronically). Long story short they invented “Morse Code”.

A morse code is basically a sequence of “dots” and “dashes” produced by long and short signals. A combination of “dots” and “dashes” represents a character. And such combination of characters can be put together to make words, messages and so on. The following figure depicts International Morse Code standard for letters and numerals.

(ref: Wikipedia)

Well, since we are talking about morse codes let’s learn an important message in morse. Hope you would never find a need to use it, but just in case. It’s “SOS”, the most common distress signal, represented in morse by “three dots, three dashes and three dots“. This can be sent by any improvised methods either by tapping or clicking or short, long lights from a flashlight or mobile phone.

Well, this should summarize the character encoding.

Okay?

Still not getting it, alright let’s take “dots” and “dashes” from morse little further to the digital era. The era of computers where “dots” and “dashes” becomes “0” and “1”.

It’s practically hard to imagine a computer without any text processor or word processor application. These applications help us to write things, store them, display on monitors or print them on paper. Not just that they provide an option to use different fonts but to be able to use different languages and different characters from those languages.

Though computers are powerful, can they really store all this text as it is? all the different characters, symbols etc. of all the languages in the world? Or maybe the right question is this the efficient way?

Well, the answer lies in the fact that a computer doesn’t understand the text. To a computer, it’s all numerical data. So in a modern-day computer individual characters (letters, numerals, symbols) all are stored and understood in the form of binary numerical codes.

So when we type a letter “A” on a keyboard it basically triggers a binary code further understandable by the microprocessor and which will be used to print the corresponding character to that code on-screen or a paper. It’s like encrypting/decrypting to/from machine understandable code and human understandable letter “A”. And the key to perform this is Character Encoding (Decoding).

Character Encoding is basically a standard which maps machine understandable binary codes corresponding to human understandable characters.

ASCII, EBCDIC, Unicode are some of the common character encodings. They all are eventually binary trying to cover a wider range of characters and serve as many as letters, numerals, symbols and language as possible.

For Example:

For letter "A" 
ascii - 41
binary - 0100 0001.
For www.unpluggedmind.in 
ascii - 119 119 119 046 117 110 112 108 117 103 103 101 100 109 105 110 100 046 105 110
binary - 01110111 01110111 01110111 00101110 01110101 01101110 01110000 01101100 01110101 01100111 01100111 01100101 01100100 01101101 01101001 01101110 01100100 00101110 01101001 01101110

 

Next Read : EBCDIC vs ASCII