Characters Representation 

Most computers represent characters using one of two types of binary codes; ASCII code or Unicode.

ASCII Code

The ASCII code is the most popular binary code for character representation in computers,

It is an 8-bit code, where 7-bits are used to represent the character and the 8th bit is used for error checking,

The ASCII character set is shown below. The row and column numbers are appended to produce the code in hexadecimal. E.g. the ASCII code for the letter A is 41, for B is 42, for the digit 0 is 30, for 1 is 31 …etc.



The Charcter set of the ASCII Code
    0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F
0  NUL SOH STX ETX EOT ENQ ACK BEL BS  HT  LF  VT  FF  CR  SO  SI
1  DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM  SUB ESC FS  GS  RS  US
2   SP  !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /
3   0   1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ?
4   @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O
5   P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _
6   `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o
7   p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~ DEL


The following is a more detailed description of the first 32 ASCII characters, often referred to as control codes.



 Detailed description of the ASCII control codes


Parity Bit

Data errors can occur during data transmission or storage/retrieval, hence we need a way to detect the occurrence of such errors. As mentioned before, the 8th bit in the ASCII code is used for error checking. This bit is usually referred to as the parity bit . There are two ways for error checking:

  1. Even Parity: Where the 8th bit is set such that the total number of 1s in the 8-bit code word is even. E.g. the representation of the letter A (ASCII Code 41h) using even parity would be (the parity bit is marked with a P on top of it):

    P
    01000001



  2. Odd Parity: The 8th bit is set such that the total number of 1s in the 8-bit code word is odd. Now the representation of the letter A using odd parity would be:

    P
    11000001



Unicode

ASCII code can represent up to 256 characters (if the MSB is not used for error checking),

This is not enough to uniquely represent all characters of all languages. That is why the Unicode was devised,

The Unicode is a 16-bit code that provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language,

Standard ASCII code (without the control characters) are represented in Unicode as they are with a (00)16 appended to the left. E.g. A is represented as 0041h,

The collapsible note below contains, as an example of Unicode, the Arabic character set.


 Unicode values for the Arabic character set