Most computers represent characters using one of two types of binary codes; ASCII code or Unicode.
ASCII Code
| The ASCII code is the most popular binary code for character representation in computers,
| | It is an 8-bit code, where 7-bits are used to represent the character and the 8th bit is used for error checking,
| | The ASCII character set is shown below. The row and column numbers are appended to produce the code in hexadecimal. E.g. the ASCII code for the letter A is 41, for B is 42, for the digit 0 is 30, for 1 is 31 …etc.
|
The Charcter set of the ASCII Code |
0 1 2 3 4 5 6 7 8 9 A B C D E F
0 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
1 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
2 SP ! " # $ % & ' ( ) * + , - . /
3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4 @ A B C D E F G H I J K L M N O
5 P Q R S T U V W X Y Z [ \ ] ^ _
6 ` a b c d e f g h i j k l m n o
7 p q r s t u v w x y z { | } ~ DEL
|
The following is a more detailed description of the first 32 ASCII characters, often referred to as control codes.
 Detailed description of the ASCII control codes |
NUL (null)
SOH (start of heading)
STX (start of text)
ETX (end of text)
EOT (end of transmission) - Not the same as ETB
ENQ (enquiry)
ACK (acknowledge)
BEL (bell) - Caused teletype machines to ring a bell. Causes a beep
in many common terminals and terminal emulation programs.
BS (backspace) - Moves the cursor (or print head) move backwards (left)
one space.
TAB (horizontal tab) - Moves the cursor (or print head) right to the next
tab stop. The spacing of tab stops is dependent
on the output device, but is often either 8 or 10.
LF (NL line feed, new line) - Moves the cursor (or print head) to a new
line. On Unix systems, moves to a new line
AND all the way to the left.
VT (vertical tab)
FF (form feed) - Advances paper to the top of the next page (if the
output device is a printer).
CR (carriage return) - Moves the cursor all the way to the left, but does
not advance to the next line.
SO (shift out) - Switches output device to alternate character set.
SI (shift in) - Switches output device back to default character set.
DLE (data link escape)
DC1 (device control 1)
DC2 (device control 2)
DC3 (device control 3)
DC4 (device control 4)
NAK (negative acknowledge)
SYN (synchronous idle)
ETB (end of transmission block) - Not the same as EOT
CAN (cancel)
EM (end of medium)
SUB (substitute)
ESC (escape)
FS (file separator)
GS (group separator)
RS (record separator)
US (unit separator)
|
Parity Bit
Data errors can occur during data transmission or storage/retrieval, hence we need a way to detect the occurrence of such errors. As mentioned before, the 8th bit in the ASCII code is used for error checking. This bit is usually referred to as the parity bit
A an extra bit that is added to the data to make the total number of 1s either even (even parity) or odd (odd parity). There are two ways for error checking:
- Even Parity: Where the 8th bit is set such that the total number of 1s in the 8-bit code word is even. E.g. the representation of the letter A (ASCII Code 41h) using even parity would be (the parity bit is marked with a P on top of it):
- Odd Parity: The 8th bit is set such that the total number of 1s in the 8-bit code word is odd. Now the representation of the letter A using odd parity would be:
Unicode
| ASCII code can represent up to 256 characters (if the MSB is not used for error checking),
| | This is not enough to uniquely represent all characters of all languages. That is why the Unicode was devised,
| | The Unicode is a 16-bit code that provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language,
| | Standard ASCII code (without the control characters) are represented in Unicode as they are with a (00)16 appended to the left. E.g. A is represented as 0041h,
| | The collapsible note below contains, as an example of Unicode, the Arabic character set.
|
 Unicode values for the Arabic character set |
Each line below contains four fields separated by a semicolon,
- The first field gives the code 4-digit hexadecimal form, of an Arabic character,
- The second field gives a short schematic name for that character, abbreviated from the normative Unicode character name,
- The third field defines the joining type: R right-joining, D dual-joining, U non-joining,
- The fourth field defines the joining group.
The Arabic characters codes:
0622; MADDA ON ALEF; R; ALEF
0623; HAMZA ON ALEF; R; ALEF
0624; HAMZA ON WAW; R; WAW
0625; HAMZA UNDER ALEF; R; ALEF
0626; HAMZA ON YEH; D; YEH
0627; ALEF; R; ALEF
0628; BEH; D; BEH
0629; TEH MARBUTA; R; TEH MARBUTA
062A; TEH; D; BEH
062B; THEH; D; BEH
062C; JEEM; D; HAH
062D; HAH; D; HAH
062E; KHAH; D; HAH
062F; DAL; R; DAL
0630; THAL; R; DAL
0631; REH; R; REH
0632; ZAIN; R; REH
0633; SEEN; D; SEEN
0634; SHEEN; D; SEEN
0635; SAD; D; SAD
0636; DAD; D; SAD
0637; TAH; D; TAH
0638; ZAH; D; TAH
0639; AIN; D; AIN
063A; GHAIN; D; AIN
0640; TATWEEL; C;
0641; FEH; D; FEH
0642; QAF; D; QAF
0643; KAF; D; KAF
0644; LAM; D; LAM
0645; MEEM; D; MEEM
0646; NOON; D; NOON
0647; HEH; D; HEH
0648; WAW; R; WAW
0649; ALEF MAKSURA; D; YEH
064A; YEH; D; YEH
0671; HAMZAT WASL ON ALEF; R; ALEF
0672; WAVY HAMZA ON ALEF; R; ALEF
0673; WAVY HAMZA UNDER ALEF; R; ALEF
0674; HIGH HAMZA; U;
0675; HIGH HAMZA ALEF; R; ALEF
0676; HIGH HAMZA WAW; R; WAW
0677; HIGH HAMZA WAW WITH DAMMA; R; WAW
0678; HIGH HAMZA YEH; D; YEH
0679; TEH WITH SMALL TAH; D; BEH
067A; TEH WITH 2 DOTS VERTICAL ABOVE; D; BEH
067B; BEH WITH 2 DOTS VERTICAL BELOW; D; BEH
067C; TEH WITH RING; D; BEH
067D; TEH WITH 3 DOTS ABOVE DOWNWARD; D; BEH
067E; TEH WITH 3 DOTS BELOW; D; BEH
067F; TEH WITH 4 DOTS ABOVE; D; BEH
0680; BEH WITH 4 DOTS BELOW; D; BEH
0681; HAMZA ON HAH; D; HAH
0682; HAH WITH 2 DOTS VERTICAL ABOVE; D; HAH
0683; HAH WITH MIDDLE 2 DOTS; D; HAH
0684; HAH WITH MIDDLE 2 DOTS VERTICAL; D; HAH
0685; HAH WITH 3 DOTS ABOVE; D; HAH
0686; HAH WITH MIDDLE 3 DOTS DOWNWARD; D; HAH
0687; HAH WITH MIDDLE 4 DOTS; D; HAH
0688; DAL WITH SMALL TAH; R; DAL
0689; DAL WITH RING; R; DAL
068A; DAL WITH DOT BELOW; R; DAL
068B; DAL WITH DOT BELOW AND SMALL TAH; R; DAL
068C; DAL WITH 2 DOTS ABOVE; R; DAL
068D; DAL WITH 2 DOTS BELOW; R; DAL
068E; DAL WITH 3 DOTS ABOVE; R; DAL
068F; DAL WITH 3 DOTS ABOVE DOWNWARD; R; DAL
0690; DAL WITH 4 DOTS ABOVE; R; DAL
0691; REH WITH SMALL TAH; R; REH
0692; REH WITH SMALL V; R; REH
0693; REH WITH RING; R; REH
0694; REH WITH DOT BELOW; R; REH
0695; REH WITH SMALL V BELOW; R; REH
0696; REH WITH DOT BELOW AND DOT ABOVE; R; REH
0697; REH WITH 2 DOTS ABOVE; R; REH
0698; REH WITH 3 DOTS ABOVE; R; REH
0699; REH WITH 4 DOTS ABOVE; R; REH
069A; SEEN WITH DOT BELOW AND DOT ABOVE; D; SEEN
069B; SEEN WITH 3 DOTS BELOW; D; SEEN
069C; SEEN WITH 3 DOTS BELOW AND 3 DOTS ABOVE; D; SEEN
069D; SAD WITH 2 DOTS BELOW; D; SAD
069E; SAD WITH 3 DOTS ABOVE; D; SAD
069F; TAH WITH 3 DOTS ABOVE; D; TAH
06A0; AIN WITH 3 DOTS ABOVE; D; AIN
06A1; DOTLESS FEH; D; FEH
06A2; FEH WITH DOT MOVED BELOW; D; FEH
06A3; FEH WITH DOT BELOW; D; FEH
06A4; FEH WITH 3 DOTS ABOVE; D; FEH
06A5; FEH WITH 3 DOTS BELOW; D; FEH
06A6; FEH WITH 4 DOTS ABOVE; D; FEH
06A7; QAF WITH DOT ABOVE; D; QAF
06A8; QAF WITH 3 DOTS ABOVE; D; QAF
06A9; OPEN KAF; D; GAF
06AA; SWASH KAF; D; SWASH KAF
06AB; KAF WITH RING; D; GAF
06AC; KAF WITH DOT ABOVE; D; KAF
06AD; KAF WITH 3 DOTS ABOVE; D; KAF
06AE; KAF WITH 3 DOTS BELOW; D; KAF
06AF; GAF; D; GAF
06B0; GAF WITH RING; D; GAF
06B1; GAF WITH 2 DOTS ABOVE; D; GAF
06B2; GAF WITH 2 DOTS BELOW; D; GAF
06B3; GAF WITH 2 DOTS VERTICAL BELOW; D; GAF
06B4; GAF WITH 3 DOTS ABOVE; D; GAF
06B5; LAM WITH SMALL V; D; LAM
06B6; LAM WITH DOT ABOVE; D; LAM
06B7; LAM WITH 3 DOTS ABOVE; D; LAM
06B8; LAM WITH 3 DOTS BELOW; D; LAM
06B9; NOON WITH DOT BELOW; D; NOON
06BA; DOTLESS NOON; D; NOON
06BB; DOTLESS NOON WITH SMALL TAH; D; NOON
06BC; NOON WITH RING; D; NOON
06BD; NOON WITH 3 DOTS ABOVE; D; NOON
06BE; KNOTTED HEH; D; KNOTTED HEH
06BF; HAH WITH MIDDLE 3 DOTS DOWNWARD AND DOT ABOVE; D; HAH
06C0; HAMZA ON HEH; R; TEH MARBUTA
06C1; HEH GOAL; D; HEH GOAL
06C2; HAMZA ON HEH GOAL; R; HAMZA ON HEH GOAL
06C3; TEH MARBUTA GOAL; R; HAMZA ON HEH GOAL
06C4; WAW WITH RING; R; WAW
06C5; WAW WITH BAR; R; WAW
06C6; WAW WITH SMALL V; R; WAW
06C7; WAW WITH DAMMA; R; WAW
06C8; WAW WITH ALEF ABOVE; R; WAW
06C9; WAW WITH INVERTED SMALL V; R; WAW
06CA; WAW WITH 2 DOTS ABOVE; R; WAW
06CB; WAW WITH 3 DOTS ABOVE; R; WAW
06CC; DOTLESS YEH; D; YEH
06CD; YEH WITH TAIL; R; YEH WITH TAIL
06CE; YEH WITH SMALL V; D; YEH
06CF; WAW WITH DOT ABOVE; R; WAW
06D0; YEH WITH 2 DOTS VERTICAL BELOW; D; YEH
06D1; YEH WITH 3 DOTS BELOW; D; YEH
06D2; YEH BARREE; R; YEH BARREE
06D3; HAMZA ON YEH BARREE; R; YEH BARREE
06D5; AE; U;
06FA; SEEN WITH DOT BELOW AND 3 DOTS ABOVE; D; SEEN
06FB; DAD WITH DOT BELOW; D; SAD
06FC; GHAIN WITH DOT BELOW; D; AIN
|
|