less_retarded_wiki/ascii.md

150 lines
12 KiB
Markdown
Raw Permalink Normal View History

# ASCII
2023-02-12 14:35:35 +01:00
ASCII ([American](usa.md) standard code for information interchange) is a relatively simple standard for digital encoding of [text](text.md) that's one of the most basic and probably the most common format used for this purpose. For its simplicity and inability to represent characters of less common alphabets it is nowadays quite often replaced with more complex encodings such as [UTF-8](utf8.md) who are however almost always backwards compatible with ASCII (interpreting UTF-8 as ASCII will give somewhat workable results), and ASCII itself is also normally supported everywhere. ASCII is the [suckless](suckless.md)/[LRS](lrs.md)/[KISS](kiss.md) character encoding, recommended and [good enough](good_enough.md) for most programs.
2024-08-05 22:39:28 +02:00
The ASCII standard assigns a 7 [bit](bit.md) code to each basic text character which gives it a room for 128 characters -- these include lowercase and uppercase [English](english.md) alphabet, decimal digits, other symbols such as a question mark, comma or brackets, plus a few special control characters that represent instructions such as carriage return which are however often obsolete nowadays. Due to most computers working with 8 bit bytes, most platforms store ASCII text with 1 byte per character; the extra bit creates a room for **extending** ASCII by another 128 characters (or creating a variable width encoding such as [UTF-8](utf8.md)). These extensions include unofficial ones such as VISCII (ASCII with additional Vietnamese characters) and more official ones, most notably [ISO 8859](iso_8859.md): a group of standards by [ISO](iso.md) for various languages, e.g. ISO 88592-1 for western European languages, ISO 8859-5 for Cyrillic languages etc. Also [IBM Code Page 437](cp437.md) is a famous unofficial extension of ASCII.
2024-02-13 21:57:06 +01:00
The ordering of characters has been kind of cleverly designed to make working with the encoding easier, for example digits start with 011 and the rest of the bits correspond to the digit itself (0000 is 0, 0001 is 1 etc.). Corresponding upper and lower case letters only differ in the 6th bit, so you can easily convert between upper and lower case by negating it as `letter ^ 0x20`. { I think there is a few missed opportunities though, e.g. in not putting digits right before letters. That way it would be very easy to print hexadecimal (and all bases up to a lot) simply as `putchar('0' + x)`. UPDATE: seen someone ask this on some stack exchange, the answer said ASCII preferred easy masking or something, seems like there was some reason. ~drummyfish }
ASCII was approved as an [ANSI](ansi.md) standard in 1963 and since then underwent many revisions every few years. The current one is summed up by the following table:
2024-03-24 21:52:08 +01:00
| dec | hex | oct | bin | other | symbol |
| ---- | ---- | ---- | ------- | --------- | --------------------- |
| 000 | 00 | 000 | 0000000 | \\000 ^@ | NUL: null |
| 001 | 01 | 001 | 0000001 | \\001 ^A | SOH: start of heading |
| 002 | 02 | 002 | 0000010 | \\002 ^B | STX: start of text |
| 003 | 03 | 003 | 0000011 | \\003 ^C | ETX: end of text |
| 004 | 04 | 004 | 0000100 | \\004 ^D | EOT: end of stream |
| 005 | 05 | 005 | 0000101 | \\005 ^E | ENQ: enquiry |
| 006 | 06 | 006 | 0000110 | \\006 ^F | ACK: acknowledge |
| 007 | 07 | 007 | 0000111 | \\a ^G | BEL: bell |
| 008 | 08 | 010 | 0001000 | \\b ^H | BS: backspace |
| 009 | 09 | 011 | 0001001 | \\t ^I | TAB: tab (horizontal) |
| 010 | 0a | 012 | 0001010 | \\n ^J | LF: new line |
| 011 | 0b | 013 | 0001011 | \\v ^K | VT: tab (vertical) |
| 012 | 0c | 014 | 0001100 | \\f ^L | FF: new page |
| 013 | 0d | 015 | 0001101 | \\r ^M | CR: carriage return |
| 014 | 0e | 016 | 0001110 | \\016 ^N | SO: shift out |
| 015 | 0f | 017 | 0001111 | \\017 ^O | SI: shift in |
| 016 | 10 | 020 | 0010000 | \\020 ^P | DLE: data link escape |
| 017 | 11 | 021 | 0010001 | \\021 ^Q | DC1: device control 1 |
| 018 | 12 | 022 | 0010010 | \\022 ^R | DC2: device control 2 |
| 019 | 13 | 023 | 0010011 | \\023 ^S | DC3: device control 3 |
| 020 | 14 | 024 | 0010100 | \\024 ^T | DC4: device control 4 |
| 021 | 15 | 025 | 0010101 | \\025 ^U | NAK: not acknowledge |
| 022 | 16 | 026 | 0010110 | \\026 ^V | SYN: sync idle |
| 023 | 17 | 027 | 0010111 | \\027 ^W | ETB: end of block |
| 024 | 18 | 030 | 0011000 | \\030 ^X | CAN: cancel |
| 025 | 19 | 031 | 0011001 | \\031 ^Y | EM: end of medium |
| 026 | 1a | 032 | 0011010 | \\032 ^Z | SUB: substitute |
| 027 | 1b | 033 | 0011011 | \\e ^[ | ESC: escape |
| 028 | 1c | 034 | 0011100 | \\034 ^\\ | FS: file separator |
| 029 | 1d | 035 | 0011101 | \\035 ^] | GS: group separator |
| 030 | 1e | 036 | 0011110 | \\036 ^^ | RS: record separator |
| 031 | 1f | 037 | 0011111 | \\037 ^_ | US: unit separator |
| 032 | 20 | 040 | 0100000 | | ` `: space |
| 033 | 21 | 041 | 0100001 | | `!` |
| 034 | 22 | 042 | 0100010 | \\" | `"` |
| 035 | 23 | 043 | 0100011 | | `#` |
| 036 | 24 | 044 | 0100100 | | `$` |
| 037 | 25 | 045 | 0100101 | | `%` |
| 038 | 26 | 046 | 0100110 | | `&` |
| 039 | 27 | 047 | 0100111 | \\' | `'` |
| 040 | 28 | 050 | 0101000 | | `(` |
| 041 | 29 | 051 | 0101001 | | `)` |
| 042 | 2a | 052 | 0101010 | | `*` |
| 043 | 2b | 053 | 0101011 | | `+` |
| 044 | 2c | 054 | 0101100 | | `,` |
| 045 | 2d | 055 | 0101101 | | `-` |
| 046 | 2e | 056 | 0101110 | | `.` |
| 047 | 2f | 057 | 0101111 | | `/` |
| 048 | 30 | 060 | 0110000 | | `0` |
| 049 | 31 | 061 | 0110001 | | `1` |
| 050 | 32 | 062 | 0110010 | | `2` |
| 051 | 33 | 063 | 0110011 | | `3` |
| 052 | 34 | 064 | 0110100 | | `4` |
| 053 | 35 | 065 | 0110101 | | `5` |
| 054 | 36 | 066 | 0110110 | | `6` |
| 055 | 37 | 067 | 0110111 | | `7` |
| 056 | 38 | 070 | 0111000 | | `8` |
| 057 | 39 | 071 | 0111001 | | `9` |
| 058 | 3a | 072 | 0111010 | | `:` |
| 059 | 3b | 073 | 0111011 | | `;` |
| 060 | 3c | 074 | 0111100 | | `<` |
| 061 | 3d | 075 | 0111101 | | `=` |
| 062 | 3e | 076 | 0111110 | | `>` |
| 063 | 3f | 077 | 0111111 | \\? | `?` |
| 064 | 40 | 100 | 1000000 | | `@` |
| 065 | 41 | 101 | 1000001 | | `A` |
| 066 | 42 | 102 | 1000010 | | `B` |
| 067 | 43 | 103 | 1000011 | | `C` |
| 068 | 44 | 104 | 1000100 | | `D` |
| 069 | 45 | 105 | 1000101 | | `E` |
| 070 | 46 | 106 | 1000110 | | `F` |
| 071 | 47 | 107 | 1000111 | | `G` |
| 072 | 48 | 110 | 1001000 | | `H` |
| 073 | 49 | 111 | 1001001 | | `I` |
| 074 | 4a | 112 | 1001010 | | `J` |
| 075 | 4b | 113 | 1001011 | | `K` |
| 076 | 4c | 114 | 1001100 | | `L` |
| 077 | 4d | 115 | 1001101 | | `M` |
| 078 | 4e | 116 | 1001110 | | `N` |
| 079 | 4f | 117 | 1001111 | | `O` |
| 080 | 50 | 120 | 1010000 | | `P` |
| 081 | 51 | 121 | 1010001 | | `Q` |
| 082 | 52 | 122 | 1010010 | | `R` |
| 083 | 53 | 123 | 1010011 | | `S` |
| 084 | 54 | 124 | 1010100 | | `T` |
| 085 | 55 | 125 | 1010101 | | `U` |
| 086 | 56 | 126 | 1010110 | | `V` |
| 087 | 57 | 127 | 1010111 | | `W` |
| 088 | 58 | 130 | 1011000 | | `X` |
| 089 | 59 | 131 | 1011001 | | `Y` |
| 090 | 5a | 132 | 1011010 | | `Z` |
| 091 | 5b | 133 | 1011011 | | `[` |
| 092 | 5c | 134 | 1011100 | \\\\ | `\` |
| 093 | 5d | 135 | 1011101 | | `]` |
| 094 | 5e | 136 | 1011110 | | `^` |
| 095 | 5f | 137 | 1011111 | | `_` |
| 096 | 60 | 140 | 1100000 | | `` ` ``: backtick |
| 097 | 61 | 141 | 1100001 | | `a` |
| 098 | 62 | 142 | 1100010 | | `b` |
| 099 | 63 | 143 | 1100011 | | `c` |
| 100 | 64 | 144 | 1100100 | | `d` |
| 101 | 65 | 145 | 1100101 | | `e` |
| 102 | 66 | 146 | 1100110 | | `f` |
| 103 | 67 | 147 | 1100111 | | `g` |
| 104 | 68 | 150 | 1101000 | | `h` |
| 105 | 69 | 151 | 1101001 | | `i` |
| 106 | 6a | 152 | 1101010 | | `j` |
| 107 | 6b | 153 | 1101011 | | `k` |
| 108 | 6c | 154 | 1101100 | | `l` |
| 109 | 6d | 155 | 1101101 | | `m` |
| 110 | 6e | 156 | 1101110 | | `n` |
| 111 | 6f | 157 | 1101111 | | `o` |
| 112 | 70 | 160 | 1110000 | | `p` |
| 113 | 71 | 161 | 1110001 | | `q` |
| 114 | 72 | 162 | 1110010 | | `r` |
| 115 | 73 | 163 | 1110011 | | `s` |
| 116 | 74 | 164 | 1110100 | | `t` |
| 117 | 75 | 165 | 1110101 | | `u` |
| 118 | 76 | 166 | 1110110 | | `v` |
| 119 | 77 | 167 | 1110111 | | `w` |
| 120 | 78 | 170 | 1111000 | | `x` |
| 121 | 79 | 171 | 1111001 | | `y` |
| 122 | 7a | 172 | 1111010 | | `z` |
| 123 | 7b | 173 | 1111011 | | `{` |
| 124 | 7c | 174 | 1111100 | | `|` |
| 125 | 7d | 175 | 1111101 | | `}` |
| 126 | 7e | 176 | 1111110 | | `~` |
| 127 | 7f | 177 | 1111111 | \\177 ^? | DEL |
2024-08-31 14:44:45 +02:00
## See Also
- [Unicode](unicode.md)
- [PETSCII](petscii.md)
- [ATASCII](atascii.md)
2024-03-24 21:52:08 +01:00
- [ASCII art](ascii_art.md)
2024-08-31 14:44:45 +02:00
- [base64](base64.md)
2024-03-24 21:52:08 +01:00
- [Morse code](morse_code.md)