Update

2024-10-08 20:06:43 +02:00 · 2024-10-08 20:06:43 +02:00 · 695e83f707
commit 695e83f707
parent 3034949bc8
16 changed files with 1839 additions and 1823 deletions
--- a/unicode.md
+++ b/unicode.md
@ -2,7 +2,7 @@

 *".̶̧̦̼̱͙̥̲̝̥̭͍̪̈́̍͌̋͑͒̒̅͂͒̀̊̓̕͠.̴̛̰̯͚͚̳͍̞̯̯͊̒͂̌̃̎̒̐̅͗́̂͠͝.̸̜̀͊̀̒̾̐̔͑̚̕͠a̲̬̪͙̖̬̖ͭͫͦ̀̄̆̍ͦͨͦ͗̅͋ͦͤͯͫ̔̚l̫̹̺̭̳͙̠̦͍̫̝͓͙̟̺͗̊̅ͬ̉͒̏͆͗͒̋ͤ̆̆ͥg̥̳̗͕̫ͪ͛̓̂ͫͮ̔͌̃̈͒̔̏ͭ͋͋  ⃝꙰⃝꙰⃝꙰⃝꙰⃝꙰⃝꙰⃝꙰⃝꙰⃝꙰⃝á́́́́́́́́́́́́́́́́́́́́́́́́́́́́́.̶̢̙̺̖͔̱͎̳̭̖̗̲̻̪̻͑̌͒̊̃̈̾̿̓̅̐́̀̋̔̏.̴̺͖͎͚̠̱̤͂̈́͜.̵̡̡͖̪̱̼͕̘̣̠̮̫͓̯͖̜̚͝͝͝.̷̧̨̥̦̥̱͉̼̗̰̪͍̱͎̑̾Z̳͍̩̻̙̪̲̞̭̦̙ͯ͛̂ͥͣͪͅͅͅl̷̢̛̩̰̹͔̣͗̅̇̍̏͑͐̇̋̑͜ͅǫ̶̢̫̟͙̖̩̽̀͆̽͌͘l̶̩̞̖̹͈͒͊̔̑̆<CCB9≯͎̺̳̄͂̊̒<CCBA>̶̸̵̶̴̸̸̴̶̸̷̶̴̴̡̢̡̢̡̢̧̧̡̧̡̨̡̨̢̧̧̡̢̛̛̛̛̼̻̣̗͔͉̩̪̞͎̖̙͍͚͍̼̰͖̺̤̗̘͕̳̻̖̳̻̗̯̭̙̳̲͕̮͇͕̼͉̞̣̟̖̘̟͕̗̼̙̻͇̝̪̦͚̤̦̣̗̤̪̟̠͖͓̟̬̲͙͇͉̘͙͙͚̜̜̮͈̞͓̰̫͍̙͙͙̱͓͖̠͇̪̭̮̤̺̗̙̘̫̤̥̳͇͔̣̩͕͍̦͈̬̯̗̘͔̻̗̘͔̪̹̬̲͇͕̻͎̣̩̻̖͉̱̝̼̞̪̠̮̤͓̥͊̔̈́̀̋̄̄̇́̋̎͛̓́̔̇̂̒̅͊̎̉͗̓̀͑̋͒͑̍̏̅̋͆̑̈̾͗̽͑̏̉̀͌͋̉̒̋̑̊̂̈́̈́͑̀͂́̈́̆̄̃͆͆̈́̊̿̌̋̍̈̒͂̀̈́͌̽͌̈́͋̈́̃̅͂͆́̍͑̓̎͋̅͂̽̈́̈́͗̆̑̔̎̈́́̆͂̉̀̒͌̿̽͊̍̃̕̚͘̚̕̚͘̕̕̕͘͜͜͝͠͝͝͠͝͝͝͠͝͠͠͠ͅͅͅͅ.̸̷̷̷̴̸̶̵̴̶̵̸̴̴̷̸̷̵̷̵̴̴̷̧̨̢̨̡̨̧̡̨̧̧̨̡̧̢̧̢̧̨̛̛̛̛̛̛̛̤͈̯̤͙̻̫̼̱̦̮̙̤̝̖̗͉̘̫̟̗̹͉͇͖̘͙̻̫̫̫̰̝̭̤͈͓͔̱̭͙͔͔̼̖̬̰̳̗͖͖̯̮͔̝̞̬̳͇͈̥̘͙͇̺̪̞̞̙͈̮͔̞̭͎̩͎̦̞̝͎̗͚͈̖̣͖̹̜̞̤̺̱̱̰͔̼̭̮̰̖͔͔͈̥͎̜̭̪̺̲͔̲̻̰̳̲̖̤̳̙̥̼̩͈̥̗̟͙̥̗̳͍̥̝̫͚̘̱̱̹̺̣̝̳̣͇̹̫̝̫̟̯̺͇̞̳͖̫͔̲̗͔̟̩̦̳͎̳͖̎̓͂̀̀́̌͗̐̅̈́̓̿̓̌́̓́͋͊͛̄͊̂̒͌̀͗̔̀̑̔͒̐̀͌̋̍͗͛̂̆̈́͛͋͆̐̌̓̄͊̑̑̅̑̿̏̈́̀̊̆̈̔̃̽̀̎̐́̎̾͐̀̌̒̑́̇̑̊͑́̓̓̔̆͐́̅̓̔̃̅̂̐͗́̎͌́̊͌͒͒̓́̀͒̍̽̂́̀̉̀̑̉̑̓́͗̓́̍̏̉͆̑͂̔̅̀͊̈́̀͑͛́̿͆͑̀͐̃̋̐̋̈́̉͊̿̌̾͗͛̉́̓̓̏̈́͂̋͌͆̓̑͗͗̍̇̕̚̚͘̕͘̚̚̕͘͘͜͜͜͜͜͜͜͜͠͠͠͝͝͠͝͠͝͠͝ͅͅͅͅͅͅ.̸̷̸̴̸̸̶̶̵̵̸̵̴̡̡̡̡̧̢̢̧̧̧̧̡̢̡̛̛̛̛̬͇̜̘̗̗̲̟̗̤̤̜̹͎̣̹̺͉̯̼̭̟̮̖͕̻̰̬̼̮̮̬̪̥̤̘̣̺̥̪̠̥̳̰͇̫͔̜̫͚͖͔̩̙̪͖̥͍̗͍͉͙̣͔̠̭̞̩̱̠̻̹͎͔̯̻̘͖̦̘͕͉͈͈̞̖̬͔͈̗͓͖͚̤̬̤̘̠̱͆̍̍͆͗͋̇͗̓͐̉͋̈́̀̍̈̇̀̀̎͋̾̇̎͐̌̌̿̽̾̃̑͆̎̾̾̈́̆̐̂̅́̓̔̇̔̑̔͑̓̍͊͌͋̔̐̑͌̓̒̎̍̃͐̀͊̿̓͋̌͐̋̂̽̿̒̋̎́͒̋͘͘͘̕̕͘͝͠͝͝ͅͅa̲̬̪͙̖̬̖ͭͫͦ̀̄̆̍ͦͨͦ͗̅͋ͦͤͯͫ̔̚l̫̹̺̭̳͙̠̦͍̫̝͓͙̟̺͗̊̅ͬ̉͒̏͆͗͒̋ͤ̆̆ͥ𒈙.̴̢̟̩̗͊.̴̹͎̦̘͇͎̩̮̻̾͛̐ͅ𰻞.̷̧̫͙̤̗͇̔̂̀̄͗̍̈͋̈́̕.̷̨̛͈̤͈̲̥̱̹̲͖͗͛͆̓͊̅̈̕͠.̷̻̺͔͍̭͋̾̐̔͑̔̌̂͛͆̽͘͜͠͝͠.̷̧̨͉̝̳̲̫̙̻͎̬͚̒̀̄͒.̶̨͙̩̦̪͋̄͆͌̈́́͐̈̈́̕ͅ.̸̡̠̙̪͔͍̬̘̖̗̙̞̬͇̐͋͊͐̋̚ͅ.̷̢̮̮̖̹̟̖̩̗͙̝̺́̑̈̉͘͘͠ͅ.̴̨̡̧̤̳͖̰̼̺̮͉͖̲̫̳̜̹̄.̵̢̤̦̞͙̝̬͍̞̤͇̽̾̈́̔̋̋̓̌̋̐̓̅͜͝.̷͙͊.̵̠̜̞̭̘͉͓̞̤͍̝̈́̋̃́̈́͐̃̉͆̚͜.̴͉͈͓͈͉͎̺͍͕̥̦̙͙͕̈́̏̿́̏̔.̶͕̟̤͔͑̉̽̓̇̐́̃̿͜.̶̧̨̨̱̪̞̞̯̹̤̘̭̠͓̀̓̐̓́͑͂̉.̴̛̙̮͚̊͗̏̈́͗̅͆̑̂̌̐̃̊̂̓.̴̙͎̔͑̿͗̃̒́̏̏͑͘̕á́́́́́́́́́́́́́́́́́́́́́́́́́́́́́"* --creator of 🎮𝕌𝕟ι𝕔𝗼d̢̪̲̬̳̩̟̍ĕ̸͓̼͙͈͐🚀

-Unicode is a successful, constantly evolving standard aiming to organize symbols and characters (letters, digits, graphical symbols, [emoji](emoji.md), ...) of all the world's writing systems and to define and standardize ways of encoding them as [digital](digital.md) [data](data.md), i.e. it's a big [project](project.md) promising to unify the encoding of any possible [text](text.md) in [computers](computer.md). As of writing the lastest version is 16.0 from 2024, defining over 150000 characters. The effort dates back to 1980s and was started to do away with the mess and headaches induced by a plethora of existing incompatible text encoding systems -- in this it succeeded, Unicode is nowadays everywhere and it's the standard way of encoding text wherever you look, probably owing a lot to its backwards compatibility with plain [ASCII](ascii.md) encoding which was the most popular encoding of English back in the day (i.e. any old ASCII text is still a valid Unicode text, provided we use UTF-8 encoding). The standard is made by the Unicode Consortium whose members are basically all the big companies.
+Unicode is a successful, constantly evolving standard aiming to organize symbols and characters (letters, digits, graphical symbols, [emoji](emoji.md), ...) of all the world's writing systems and to define and standardize ways of encoding them as [digital](digital.md) [data](data.md), i.e. it's a big [project](project.md) promising to unify the encoding of any possible [text](text.md) in [computers](computer.md). As of writing this the lastest version is 16.0 from 2024, defining over 150000 characters. The effort dates back to 1980s and was started to do away with the mess and headaches induced by a plethora of existing incompatible text encoding systems -- in this it succeeded, Unicode is nowadays everywhere and it's the standard way of encoding text wherever you look, probably owing a lot to its backwards compatibility with plain [ASCII](ascii.md) encoding which was the most popular encoding of [English](english.md) back in the day (i.e. any old ASCII text is still a valid Unicode text, provided we use UTF-8 encoding). The standard is made by the Unicode Consortium whose members are basically all the big companies.

 In Unicode every character is unique like a unicorn. It has all the diverse characters such as the penis (𓂸), ejaculating penis (𓂺), swastika (卐), hammer and sickle (☭), white power sign (👌), middle finger (🖕), pile of [shit](shit.md) (💩) etc. **Here is a lulzy part of Unicode**: it's possible to combine some characters together with so called *combining characters*, so purely IN THEORY one can for example combine the prohibition symbol (U+20Ex) with [LGBT](lgbt.md) propaganda characters and other [fascist](fascism.md) symbols to create interesting emojis likes so: 🏳️‍🌈⃠👨🏿⃠👩⃠. Of course this created some controversies :D { It now seems like some systems refuse to render combinations of characters that might go against current official world politics. See also: [1984](1984.md). ~drummyfish }

@ -22,7 +22,7 @@ The Unicode [project](project.md) is indeed highly ambitious, it's extremely dif

 It's also crucial for Unicode to very clearly state its goals and philosophies so that all the issues and questions that come up may be answered and decided in accordance with them. For example part of the Unicode philosophy is to treat the symbols as abstract entities defined by their usage and meaning rather than their exact graphical representation (this is left to specific typesetting/rendering systems, [fonts](font.md) etc.).

-**Is Unicode [crap](shit.md) and [bloat](bloat.md)?** Yes, it inevitably has to be, there's a lot of obscurity and crap in Unicode and many systems infamously can't handle malicious (or even legit) Unicode text and will possibly even crash (see e.g. the infamous *black dot of death*). A lot of that mess previously caused by different encodings now just poured over to Unicode itself: for example there are sometimes multiple versions of the exact same character (e.g. those with accents -- one versions is a composed plain character plus accent character, the other one a single "precomposed" character) and so it's possible to encode exactly the same string in several ways and a non-trivial Unicode [normalization](normalization.md) is required to fix this. Unicode can be raped and abused in spectacular ways, for example using homoglyphs (characters that graphically look like other characters but are in fact different) one may create text that won't be detected by simple exact-comparison algorithms (for example you may be able to register a username that graphically looks like someone else's already registered username). There are also ways to combine characters in queer ways, e.g. make very tall text by creating chains of exponents or something (see the rabbithole around so called *composing characters*), which can just similarly nuke many naive programs. With Unicode things that were previously simple (such as counting string length or computing the size of rectangle into which a text will fit) now become hard (and slow) to do. Still it has to be said that **Unicode is designed relatively well** (of course minus the fascist political bias in its choice of characters) for what it's trying to do, it's just that the goal is ultimately an untameable beast, a bittersweet topic and a double edged sword -- for [LRS](lrs.md) it's important especially that we don't have to care much about it, we can just still keep using [ASCII](ascii.md) and we're good, i.e. we aren't forced to use the bloated part of Unicode and if we get Unicode text, we can quite easily filter out non-ASCII characters. Full Unicode compliance is always bloat and shouldn't be practiced, but it's possible to partially comply with only minimum added complexity. On one hand it [just werks](just_werks.md) -- back in the [90s](1990s.md) we still had to trial/error different encodings to correctly display non-English texts, nowadays everything just displays correctly, but comfort comes with a price tag. Unicode has, to some degree, fucked up many texts because soyboys and bloat fans now try to use the "correct" characters for everything, so they will for example use the correct "multiplication sign" instead of just *x* or * which won't display well in ASCII readers, but again, this can at least be automatically corrected. Terminal emulators now include ugly Unicode bullcrap and have to drag along huge fonts and a constantly updating Unicode library. Unicode is also controversial because [SJWs](sjw.md) push it too hard, claiming that ASCII is [racist](racism.md) to people who can only write in retarded languages like [Chinese](chinese.md) -- we say it's better for the Chinese to learn [English](english.md) than to fuck computers up. Other controversies revolve around emojis and other political symbols, SJWs push crap like images of pregnant men and want to [censor](censorship.md) "offensive" symbols. Unicode also allowed noobs to make what they call "[ASCII_art](ascii_art.md)" without having any actual skill at it.
+**Is Unicode [crap](shit.md) and [bloat](bloat.md)?** Yes, it inevitably has to be, there's a lot of obscurity and crap in Unicode and many systems infamously can't handle malicious (or even legit) Unicode text and will possibly even crash (see e.g. the infamous *black dot of death*). A lot of that mess previously caused by different encodings now just poured over to Unicode itself: for example there are sometimes multiple versions of the exact same character (e.g. those with accents -- one versions is a composed plain character plus accent character, the other one a single "precomposed" character) and so it's possible to encode exactly the same string in several ways and a non-trivial Unicode [normalization](normalization.md) is required to fix this. Unicode can be raped and abused in spectacular ways, for example using homoglyphs (characters that graphically look like other characters but are in fact different) one may create text that won't be detected by simple exact-comparison algorithms (for example you may be able to register a username that graphically looks like someone else's already registered username). There are also ways to combine characters in queer ways, e.g. make very tall text by creating chains of exponents or something (see the rabbithole around so called *composing characters*), which can just similarly nuke many naive programs. With Unicode things that were previously simple (such as counting string length or computing the size of rectangle into which a text will fit) now become hard (and slow) to do. Still it has to be said that **Unicode is designed relatively well** (of course minus the fascist political bias in its choice of characters) for what it's trying to do, it's just that the goal is ultimately an untameable beast, a bittersweet topic and a double edged sword -- for [LRS](lrs.md) it's important especially that we don't have to care much about it, we can just still keep using [ASCII](ascii.md) and we're good, i.e. we aren't forced to use the bloated part of Unicode and if we get Unicode text, we can quite easily filter out non-ASCII characters. Full Unicode compliance is always bloat and shouldn't be practiced, but it's possible to partially comply with only minimum added complexity. On one hand it [just werks](just_werks.md) -- back in the [90s](90s.md) we still had to trial/error different encodings to correctly display non-English texts, nowadays everything just displays correctly, but comfort comes with a price tag. Unicode has, to some degree, fucked up many texts because soyboys and bloat fans now try to use the "correct" characters for everything, so they will for example use the correct "multiplication sign" instead of just *x* or * which won't display well in ASCII readers, but again, this can at least be automatically corrected. Terminal emulators now include ugly Unicode bullcrap and have to drag along huge fonts and a constantly updating Unicode library. Unicode is also controversial because [SJWs](sjw.md) push it too hard, claiming that ASCII is [racist](racism.md) to people who can only write in retarded languages like [Chinese](chinese.md) -- we say it's better for the Chinese to learn [English](english.md) than to fuck computers up. Other controversies revolve around emojis and other political symbols, SJWs push crap like images of pregnant men and want to [censor](censorship.md) "offensive" symbols. Unicode also allowed noobs to make what they call "[ASCII_art](ascii_art.md)" without having any actual skill at it.

 Here are some **examples** of Unicode characters: