less_retarded_wiki/ascii.md
2024-02-13 21:57:06 +01:00

9.3 KiB

ASCII

ASCII (American standard code for information interchange) is a relatively simple standard for digital encoding of text that's one of the most basic and probably the most common format used for this purpose. For its simplicity and inability to represent characters of less common alphabets it is nowadays quite often replaced with more complex encodings such as UTF-8 who are however almost always backwards compatible with ASCII (interpreting UTF-8 as ASCII will give somewhat workable results), and ASCII itself is also normally supported everywhere. ASCII is the suckless/LRS/KISS character encoding, recommended and good enough for most programs.

The ASCII standard assigns a 7 bit code to each basic text character which gives it a room for 128 characters -- these include lowercase and uppercase English alphabet, decimal digits, other symbols such as a question mark, comma or brackets, plus a few special control characters that represent instructions such as carriage return which are however often obsolete nowadays. Due to most computers working with 8 bit bytes, most platforms store ASCII text with 1 byte per character; the extra bit creates a room for extending ASCII by another 128 characters (or creating a variable width encoding such as UTF-8). These extensions include unofficial ones such as VISCII (ASCII with additional Vietnamese characters) and more official ones, most notably ISO 8859: a group of standards by ISO for various languages, e.g. ISO 88592-1 for western European languages, ISO 8859-5 for Cyrillic languages etc.

The ordering of characters has been kind of cleverly designed to make working with the encoding easier, for example digits start with 011 and the rest of the bits correspond to the digit itself (0000 is 0, 0001 is 1 etc.). Corresponding upper and lower case letters only differ in the 6th bit, so you can easily convert between upper and lower case by negating it as letter ^ 0x20. { I think there is a few missed opportunities though, e.g. in not putting digits right before letters. That way it would be very easy to print hexadecimal (and all bases up to a lot) simply as putchar('0' + x). UPDATE: seen someone ask this on some stack exchange, the answer said ASCII preferred easy masking or something, seems like there was some reason. ~drummyfish }

ASCII was approved as an ANSI standard in 1963 and since then underwent many revisions every few years. The current one is summed up by the following table:

dec hex oct bin symbol
000 00 000 0000000 NUL: null
001 01 001 0000001 SOH: start of heading
002 02 002 0000010 STX: start of text
003 03 003 0000011 ETX: end of text
004 04 004 0000100 EOT: end of stream
005 05 005 0000101 ENQ: enquiry
006 06 006 0000110 ACK: acknowledge
007 07 007 0000111 BEL: bell
008 08 010 0001000 BS: backspace
009 09 011 0001001 TAB: tab (horizontal)
010 0a 012 0001010 LF: new line
011 0b 013 0001011 VT: tab (vertical)
012 0c 014 0001100 FF: new page
013 0d 015 0001101 CR: carriage return
014 0e 016 0001110 SO: shift out
015 0f 017 0001111 SI: shift in
016 10 020 0010000 DLE: data link escape
017 11 021 0010001 DC1: device control 1
018 12 022 0010010 DC2: device control 2
019 13 023 0010011 DC3: device control 3
020 14 024 0010100 DC4: device control 4
021 15 025 0010101 NAK: not acknowledge
022 16 026 0010110 SYN: sync idle
023 17 027 0010111 ETB: end of block
024 18 030 0011000 CAN: cancel
025 19 031 0011001 EM: end of medium
026 1a 032 0011010 SUB: substitute
027 1b 033 0011011 ESC: escape
028 1c 034 0011100 FS: file separator
029 1d 035 0011101 GS: group separator
030 1e 036 0011110 RS: record separator
031 1f 037 0011111 US: unit separator
032 20 040 0100000 : space
033 21 041 0100001 !
034 22 042 0100010 "
035 23 043 0100011 #
036 24 044 0100100 $
037 25 045 0100101 %
038 26 046 0100110 &
039 27 047 0100111 '
040 28 050 0101000 (
041 29 051 0101001 )
042 2a 052 0101010 *
043 2b 053 0101011 +
044 2c 054 0101100 ,
045 2d 055 0101101 -
046 2e 056 0101110 .
047 2f 057 0101111 /
048 30 060 0110000 0
049 31 061 0110001 1
050 32 062 0110010 2
051 33 063 0110011 3
052 34 064 0110100 4
053 35 065 0110101 5
054 36 066 0110110 6
055 37 067 0110111 7
056 38 070 0111000 8
057 39 071 0111001 9
058 3a 072 0111010 :
059 3b 073 0111011 ;
060 3c 074 0111100 <
061 3d 075 0111101 =
062 3e 076 0111110 >
063 3f 077 0111111 ?
064 40 100 1000000 @
065 41 101 1000001 A
066 42 102 1000010 B
067 43 103 1000011 C
068 44 104 1000100 D
069 45 105 1000101 E
070 46 106 1000110 F
071 47 107 1000111 G
072 48 110 1001000 H
073 49 111 1001001 I
074 4a 112 1001010 J
075 4b 113 1001011 K
076 4c 114 1001100 L
077 4d 115 1001101 M
078 4e 116 1001110 N
079 4f 117 1001111 O
080 50 120 1010000 P
081 51 121 1010001 Q
082 52 122 1010010 R
083 53 123 1010011 S
084 54 124 1010100 T
085 55 125 1010101 U
086 56 126 1010110 V
087 57 127 1010111 W
088 58 130 1011000 X
089 59 131 1011001 Y
090 5a 132 1011010 Z
091 5b 133 1011011 [
092 5c 134 1011100 \
093 5d 135 1011101 ]
094 5e 136 1011110 ^
095 5f 137 1011111 _
096 60 140 1100000 `: backtick
097 61 141 1100001 a
098 62 142 1100010 b
099 63 143 1100011 c
100 64 144 1100100 d
101 65 145 1100101 e
102 66 146 1100110 f
103 67 147 1100111 g
104 68 150 1101000 h
105 69 151 1101001 i
106 6a 152 1101010 j
107 6b 153 1101011 k
108 6c 154 1101100 l
109 6d 155 1101101 m
110 6e 156 1101110 n
111 6f 157 1101111 o
112 70 160 1110000 p
113 71 161 1110001 q
114 72 162 1110010 r
115 73 163 1110011 s
116 74 164 1110100 t
117 75 165 1110101 u
118 76 166 1110110 v
119 77 167 1110111 w
120 78 170 1111000 x
121 79 171 1111001 y
122 7a 172 1111010 z
123 7b 173 1111011 {
124 7c 174 1111100 `
125 7d 175 1111101 }
126 7e 176 1111110 ~
127 7f 177 1111111 DEL

See Also