This commit is contained in:
Miloslav Ciz 2025-03-17 16:42:36 +01:00
parent 6f0a813940
commit f69e3a3e4b
16 changed files with 2006 additions and 1999 deletions

View file

@ -4,9 +4,9 @@ ASCII ([American](usa.md) standard code for [information](information.md) interc
The ASCII standard assigns a 7 [bit](bit.md) code to each basic text character which gives it a room for 128 characters -- these include lowercase and uppercase [English](english.md) alphabet, decimal digits, other symbols such as a question mark, comma or brackets, plus a few special control characters that represent instructions such as carriage return which are however often obsolete nowadays. Due to most computers working with 8 bit bytes, most platforms store ASCII text with 1 byte per character; the extra bit creates a room for **extending** ASCII by another 128 characters (or creating a variable width encoding such as [UTF-8](utf8.md)). These extensions include unofficial ones such as VISCII (ASCII with additional Vietnamese characters) and more official ones, most notably [ISO 8859](iso_8859.md): a group of standards by [ISO](iso.md) for various languages, e.g. ISO 88592-1 for western European languages, ISO 8859-5 for Cyrillic languages etc. Also [IBM Code Page 437](cp437.md) is a famous unofficial extension of ASCII.
The ordering of characters has been kind of cleverly designed to make working with the encoding easier, for example digits start with 011 and the rest of the bits correspond to the digit itself (0000 is 0, 0001 is 1 etc.). Corresponding upper and lower case letters only differ in the 6th bit, so you can easily convert between upper and lower case by negating it as `letter ^ 0x20`. { I think there is a few missed opportunities though, e.g. in not putting digits right before letters. That way it would be very easy to print hexadecimal (and all bases up to a lot) simply as `putchar('0' + x)`. UPDATE: seen someone ask this on some stack exchange, the answer said ASCII preferred easy masking or something, seems like there was some reason. ~drummyfish }
The ordering of characters has been kind of cleverly designed in order to facilitate certain operations with the characters, for example digits always start with 011 and the rest of the bits corresponds to the [binary](binary.md) value of the digit (0000 is 0, 0001 is 1 etc.). Corresponding upper and lower case letters only differ in the 6th bit, so conversion of case is achieved simply by negating the bit as `letter ^ 0x20`. { I think there is a few missed opportunities though, e.g. in not putting digits right before letters. That way it would be very easy to print hexadecimal (and all bases up to a lot) simply as `putchar('0' + x)`. UPDATE: seen someone ask this on some stack exchange, the answer said ASCII preferred easy masking or something, seems like there was some reason. ~drummyfish }
ASCII was approved as an [ANSI](ansi.md) standard in 1963 and since then underwent many revisions every few years. The current one is summed up by the following table:
ASCII was approved as an [ANSI](ansi.md) standard in 1963 and thereafter underwent many revisions every few years. The current one is summed up by the following table:
| dec | hex | oct | bin | other | symbol |
| ---- | ---- | ---- | ------- | --------- | --------------------- |