Excel Functions

ASCII Code, Extended ASCII characters (8-bit system) and ANSI Code.

User Rating:  / 1


ASCII Code, Extended ASCII characters (8-bit system) and ANSI Code


Related Links: 

1. VBA Chr & Asc functions explained; corresponding Excel CHAR and CODE functions.

2. Excel Text and String Functions: TRIM & CLEAN.



ASCII Code: Computers use the binary system internally for processing ie. they work with binary codes of 0s and 1s (zeroes and ones). The computer converts letters and characters into numbers, and then converts those numbers into binary. The system to encode letters and characters into numbers is called the ASCII code (American Standard Code for Information Interchange). It is a set of 128 characters wherein the first 32 (character codes 0-31) are control codes (used to control peripherals such as printers) and spacing characters and are unprintable (refer Table 1). The remaining 96 (character codes 32-127) are printable characters (representing the numbers from 0-9, the uppercase and lowercase English alphabets, punctuation marks, and symbols) which you will find on your keyboard (refer Table 2). Each character is given its own number reference. As with most things computer-related, the characters are counted from zero (not one), so they cover a numerical range of 0-127. Character 127 represents the command DEL. For example, the ASCII code for the capital letter "A" is the number 65, which is representable in binary using 0s and 1s (65 converts to binary number 1000001).



ASCII to Extended ASCII characters (8-bit system) and ANSI Code: In the 1960s, a need for standardization led to ASCII, which is a 7-bit system. But today almost everything is done in an 8-bit system. With 7 bits, 128 numbers (0-127 in decimal notation) are available to code characters. A bit is a binary digit which can have either two values, on or off. Seven bits can have 2^7 or 128 possible unique values. ASCII was soon expanded to an 8-bit system that has 256 code points, 0-255 (8-bit corresponds to 2 ^ 8 ie. 256 possibilities). There are many variants of Extended ASCII characters (8-bit system) to cover regional characters and symbols. One example is the extended ASCII characters which includes various letters needed for writing languages of Western Europe and certain special characters. This encoding is called ISO Latin-1 or ISO 8859-1, (ISO refers to International Organization for Standardization), which is the default character set in most browsers. The ISO 8859-1 character set includes the original ASCII character set (values 0 to 127), plus an extended character set (codes from 160-255) which contains the characters used in Western European countries and some commonly used special characters. Many Windows systems use another related 8-bit encoding, and this Microsoft specific encoding is referred to as ANSI, or Windows-1252. It is similar to ISO 8859-1 except that character codes 128-159 in ISO 8859-1 are reserved for controls whereas ANSI uses most of them for printable characters. ANSI stands for American National Standards Institute. The ANSI character set includes the standard ASCII character set (values 0 to 127), plus an extended character set (values 128 to 255; refer Table 3).



Extended ASCII characters (8-bit) and UNICODE: In addition to the ISO 8859-1 (Latin-1, West European languages) encoding, the ISO 8859 standard includes several 8-bit extensions to the ASCII character set, viz. ISO 8859-2 (Latin-2, Central and East European languages); ISO 8859-3 (Latin-3, Southeast European and miscellaneous languages); ISO 8859-4 (Latin-4, Scandinavian/Baltic languages); ISO 8859-5 (Latin/Cyrillic); and so on. In these 8-bit extensions, the lower 128 characters (0 to 127) are the same ASCII characters, while the upper 128 (128 t0 255) characters are for the appropriate language and symbols. However, the significant drawback of 256 characters limitation remained because languages such as Japanese and Arabic have thousands of characters. Also, the problem of incompatibility resulted if a user saves a file as Latin-1 while a co-user uses Latin 7 system in which case the extended characters (128 to 255) might display wrongly. This resulted in UNICODE representation which is a language independent code and uses 16-bits to store each alphanumeric character, allowing upto 2 ^ 16 ie. 65,536 unique characters, viz. Unicode UTF-8, UTF-16 or UTF-32. However, the 7-bit ASCII code system continues to be dominant and widely used as it is one of the few standards that all computers understand, and all characters used in email messages and HTML documents (used for web browsing) are ASCII characters.












Bottom Ad

© 2014 GlobaliConnect.com. All rights reserved.