World Wide Web (www or just *the web*) is (or was -- by 2023 mainstream web is dead) a [network](network.md) of interconnected documents on the [Internet](internet.md), which we call *websites* or *webpages*. Webpages are normally written in the [HTML](html.md) [language](language.md) and can refer to each other by [hyperlinks](hyperlink.md) ("clickable" links right in the text). The web itself works on top of the [HTTP](http.md) protocol which dictates how clients and servers communicate. Less knowledgeable people confuse the web with the Internet, but of course those people are retarded: web is just one of many services existing on the Internet (other ones being e.g. [email](email.md) or [torrents](torrent.md)). In order to browse the web you need an Internet connection and a [web browser](browser.md).
{ **How to browse the web in the [age of shit](21st_century.md)?** Currently my "workflow" is following: I use the [badwolf](badwolf.md) browser (a super suckless, very fast from-scratch browser that allows turning JavaScript on/off, i.e. I mostly browse [small web](smol_internet.md) without JS but can still do banking etc.) with a **CUSTOM START PAGE** that I completely own and which only changes when I want it to -- this start page is just my own tiny HTML on my disk that has links to my favorite sites (which serves as my suckless "bookmark" system) AND a number of search bars for different search engines (Google, Duckduckgo, Yandex, wiby, Searx, marginalia, Right Dao, ...). This is important as nowadays you mustn't rely on Google or any other single search engine -- I just use whichever engine I deem best for my request at any given time. ~drummyfish }
An important part of the web is also searching its vast oceans of information with [search engines](search_engine.md) such as the infamous [Google](google.md) engine (as of 2024 still functioning technically but no longer practically). Websites have human readable [url](url.md) addresses thanks to [DNS](dns.md).
Famous and big as it was, it's sad that mainstream web is now EXTREMELY [bloated](bloat.md) and **100% unusable**, beyond saving -- owing of course to [capitalism](capitalism.md). The murdering of web would be probably seen as one of the worst disasters of technological world in history, wasn't it for the fact that countless other disasters of similar magnitude are just happening constantly in [21st century](21st_century.md). The web is now like Chernobyl: a curious place to visit, however radioactive to such a high degree that you can't stay for too long else you acquire brain [cancer](cancer.md). For more [suckless](suckless.md) alternatives to web see [gopher](gopher.md). See also [smol web](smol_internet.md).
Prior to the tragedy of [mainstreamization](mainstream.md) the web used to be perhaps the greatest and most spectacular part of the whole Internet, the service that made Internet widespread, however it soon deteriorated by [capitalist](capitalism.md) interests, commercialization and subsequent invasion of idiots from real world; by this date, in 2020s, it is one of the most illustrative, depressing and also hilarious examples of [capitalist](capitalist_software.md) [bloat](bloat.md). A good article about the issue, called *The Website Obesity Crisis*, can be found at https://idlewords.com/talks/website_obesity.htm. There used to be a tool for measuring website bloat (now ironically link rotted to some ad lol) which worked like this: it computed the ratio of the page size to the size of its screenshot (e.g. [YouTube](youtube.md), as of writing this, scored 35.7).
Currently there's a "vision" of so called **"[web 3](web3.md)"** which is supposed to be the "next iteration" of the web with new "[paradigms](paradigm.md)", making use of "[modern](modern.md)" (i.e. probably [shitty](shit.md)) technology such as [bloackchain](blockchain.md); they say web 3 wants to use [decentralization](decentralization.md) to prevent central control and possibly things like [censorship](censorship.md), however [we](lrs.md) can almost certainly guarantee web 3 will be yet exponentially amplified pile of [bloat](bloat.md), garbage and a worse dystopia than our nightmares were able to come up with so far, we simply have to leave this ship sink. If web 3 is what web 2.0 was to web 1.0, then indeed we are [doomed](doom.md). Our prediction is that web will simply lose its status of the biggest Internet service just as [Usenet](usenet.md) did, or like TV lost its status of the main audiovisual media; web will be replaced by something akin to "islands of franchised social media accessed through apps"; it will still be around but will be just a huge ad-littered swamp inferior to [teletext](teletext.md) where the elderly go to share pictures no one wants to see and where guys go to masturbate.
{ As of 2023 my 8GB RAM computer with multiple 2+ GHz CPUs has serious issues browsing the "modern" web, i.e. it is sweating on basically just displaying a formatted text, which if done right is quite comfortably possible to do on a computer with 100000 times lower hardware specs! In fact orders of magnitude weaker computers could browse the web much faster 20 years ago. Just think about how deeply fucked up this is: the world's forefront information highway and "marvel of technology" has been raped by capitalist soydevs so much that it is hundreds of thousands times less efficient than it should be, AND it wouldn't even require much effort to make it work well -- in fact it is much easier to make it work well. Imagine your car consuming 100000 litres of gasoline instead of 1 or your house leaking 99999 litres of water for any 1 litre of water you use, plus you paying extra money for it to be so. This is the absolute state of dystopian capitalist society. ~drummyfish }
{ Ah this pseudoimage above [made it to Encyclopedia Dramatica](https://encyclopediadramatica.top/index.php?title=Internets&oldid=4018#In_a_nutshell) :D Thank you kind stranger <3~drummyfish}
Back in the day (90s and early 2000s) web used to be a place of [freedom](freedom.md) working more or less in a decentralized manner, on the principles of [free speech](free_speech.md), [anarchism](anarchism.md) and, to the [Yankee](usa.md)'s dismay, even [communism](communism.md) -- people used to run their own unique, non-commercial websites where they shared freely and openly, [censorship](censorship.md) was difficult to implement, unwelcome and therefore mostly non-existent and websites used to have a way better design, they were [KISS](kiss.md), lightweight, safer, "open" (no paywalls, registration walls, country blocks, [DRM](drm.md), ...), MUCH faster and more robust as they were pure [HTML](html.md) documents, without scripts, "[apps](app.md)", jumpscare [ads](marketing.md) -- simply without [bullshit](bullshit.md). It was also the case that most websites were truly nice, useful and each one had a "soul" as they were usually made by passionate nerds who had a creative freedom and true desires to create a good website (and this still continued for a while after the invasion of businesses, i.e. commercial sites were still pretty bearable).
As the time marched on web used to stink more and more of [shit](shit.md), as is the fate of everything touched by the [capitalist](capitalist_software.md) hand -- the advent of so called **web 2.0** brought about a lot of [complexity](complexity.md), websites started to incorporate and push client-side scripts ([JavaScript](javascript.md), [Flash](flash.md), [Java](java.md) applets, ...) which led to many negative things such as incompatibility with browsers (kickstarting browser [consumerism](consumerism.md) and [update culture](update_culture.md)), performance loss and security vulnerabilities (web pages now became programs rather than mere documents) and more complexity in web browsers, which leads to immense [bloat](bloat.md) and browser [monopolies](bloat_monopoly.md) (higher effort is needed to develop a browser, making it a privilege of those who can afford it, and those can subsequently dictate de-facto standards that further strengthen their monopolies). Another disaster came with **[social networks](social_network.md)** in mid 2000s, most notably [Facebook](facebook.md) but also [YouTube](youtube.md), [Twitter](twitter.md) and others, which centralized the web and rid people of control. Out of comfort people stopped creating and hosting own websites and rather created a page on Facebook. This gave the power to corporations and allowed **mass-surveillance**, **mass-censorship** and **propaganda brainwashing**. As the web became more and more popular, corporations and governments started to take more control over it, creating technologies and laws to make it less free. By 2020, the good old web is but a memory and a hobby of a few boomers, everything is controlled by corporations, infected with billions of unbearable ads, [DRM](drm.md), malware (trackers, [crypto](crypto.md) miners, ...), there exist no good web browsers, web pages now REQUIRE JavaScript even if it's not needed in principle due to which they are painfully slow and buggy, there are restrictive laws and censorship and de-facto laws (site policies) put in place by corporations controlling the web. Official web standards, libraries and frameweworks got into such an unbelievably bloated, complicated, corrupted and degenerated state (look up e.g. [Shadow DOM](shadow_dom.md)) that one cannot but stare in astonishment about the stupidity.
Mainstream web is quite literally unusable nowadays. { 2023 update: whole web is now behind [cuckflare](cloudfare.md) plus [secure HTTPS safety privacy antipedophile science encrypted privacy antiterrorist democratic safety privacy security expert antiracist sandboxed protection](https.md) and therefore literally can't be used. Also Google has been absolutely destroyed by the [LLM](llm.md) AIs now. ~drummyfish } What people searched for on the web they now search on on a handful of platforms like Facebook and YouTube (often not even using a web browser but rather a mobile "[app](app.md)"); if you try to "google" something, what you get is just a list of unusable sites written by [AIs](ai.md) that load for several minutes (unless you have the latest 1024 TB RAM beast) and won't let you read beyond the first paragraph without registration. These sites are uplifted by [SEO](seo.md) for pure commercial reasons, they contain no useful information, just ads. Useful sites are buried under several millions of unusable results or downright censored for political reasons (e.g. using some forbidden word). Thankfully you can still try to browse the [smol web](smol_internet.md) with search engines such as [wiby](wiby.md), but still that only gives a glimpse of what the good old web used to be.
{ More of web 2023 experience: if you want to Google something as simple as "HTML ampersand", just to get the HTML entity 5 character code, you basically get referred to a site that's 200 MB big, loads for about 1 minute (after you pass 10 checks for not being a robot), has 50 sections and subsections like "Who This Tutorial on Copypasting 5 Character is for", "What You Will Learn in This Tutorial", "Time Required for Reading This Tutorial" (which without these sections would be like 3 seconds), "Introduction: History of HTML" (starting with Stone Age) etc. There are of course about 7 video ads between each section and the next. Then finally there is the `&` code you can copy paste, buried in level 12 subsection ("HTML Code" -> "History of Programming Since Napoleon Bonaparte" -> "How Ada Lovelace Invented Computer Science" -> "How Tim Berners-Lee Stole The Idea For Web from His Wife" -> "Why Women Only Crews For Next Space Mission are a Good Idea" -> "How This All Finally Gets Us to HTML Amp Entity" -> ...). Then of course there follow about 600 more sections like "Methodology Used to Create This Copypasting Tutorial" etcetc. until "Conclusion: What We Have Learned about the HTML Amp Entity and History of Feminism"; but at least you don't have to scroll through that; anyway at this point you are already suicidal and don't even want to write your HTML anymore. ~drummyfish }
WHY does every fucking SINGLE ONE, EVERY SINGLE WEBSITE ON EARTH have to have ads on it now? EVERY.SINGLE.WEBSITE.HAS.ADS. Everyone single FUCKING WEBSITE HAS ADS -- why? No, you fucking don't need money to run a website, stop giving this moronic argument. Don't you have $0.00000001 to pay for a domain and raspberri pi? Stop that fucking shit. Back then website didn't have ads and existed you idiot. Make a website without ads else spare us this shit and take it down.
As with most groundbreaking inventions the web didn't appear out of nowhere, as may seem in retrospect -- the ideas it employed were tried in times prior, for example the [NABU](nabu.md) network did something similar even 10 years before the web; likewise [Usenet](usenet.md), the [BBS](bbs.md) networks and so on. Nevertheless it wouldn't be until the end of 1980s that all the right ingredients came together in the right mix, under ideal circumstances and with a bit of luck to get really popular.
World Wide Web was invented by an English computer scientist [Tim Berners-Lee](berners_lee.md). In 1980 he employed [hyperlinks](hyperlink.md) in a notebook program called ENQUIRE and he saw the idea was good. On March 12 1989 he was working at [CERN](cern.md) where he proposed a system called "web" that would use [hypertext](hypertext.md) to link documents (the term hypertext was already around). He also considered the name *Mesh* but settled on *World Wide Web* eventually. He started to implement the system with a few other people. At the end of 1990 they already had implemented the [HTTP](http.md) protocol for client-server communication, the [HTML](html.md), language for writing websites, the first web server and the first [web browser](browser.md) called *WorldWideWeb*. They set up the first website http://info.cern.ch that contained information about the project (still accessible as of writing this).
In 1993 CERN made the web [public domain](public_domain.md), free for anyone without any licensing requirements. The main reason was to gain advantage over competing systems such as [Gopher](gopher.md) that were [proprietary](proprietary.md). By 1994 there were over 500 web servers around the world. WWW Consortium ([W3M](w3m.md)) was established to maintain standards for the web. A number of new browsers were written such as the text-only [Lynx](lynx.md), but the [proprietary](proprietary.md) [Netscape Navigator](netscape_navigator.md) would go to become the most popular one until [Micro$oft](microsoft.md)'s [Internet Explorer](internet_explorer.md) (see [browser wars](browser_wars.md)). In 1997 [Google](google.md) search engine appeared, as well as [CSS](css.md). There was a economic bubble connected to the explosion of the Web called the [dot-comm boom](dot_com_boom.md).
Interesting between 2000 and 2010 a mobile alternative to the web, called [WAP](wap.md), briefly came to the scene. Back then mobile phones were significantly weaker than PCs so the whole protocol was simplified, e.g. it had a special markup language called [WML](wml.md) instead of [HTML](html.md). But as the phones got more powerful they simply started to support normal web and WAP had to say goodbye.
Around 2005, when [YouTube](youtube.md), [Twitter](twitter.md), [Facebook](facebook.md) and other shit websites (or shall we say "webshites"?) started to appear and stole the mainstream popularity, so called [Web 2.0](web_20.md) began to form. This was a shift (or shall we say "[shit](shit.md)"?) in the web's paradigm towards more ugliness and hostility such as more [JavaScript](javascript.md), [bloat](bloat.md), interactivity, websites as programs, [Flash](flash.md), [social networks](social_network.md) etc. This would be the beginning of the web's downfall.
Users browse the Internet using [web browsers](browser.md), programs made specifically for this purpose. Pages on the [Internet](internet.md) are addressed by their [URL](url.md), a kind of textual address such as `http://www.mysite.org/somefile.html`. This address is entered into the web browser, the browser retrieves it and displays it.
A webpage can contain text, pictures, graphics and nowadays even other media like video, audio and even programs that run in the browser. Most importantly webpages are [hypertext](hypertext.md), i.e. they may contain clickable references to other pages -- clicking a link immediately opens the linked page.
The page itself is written in [HTML](html.md) language (not really a [programming](programming.md), more like a file format), a relatively simple language that allows specifying the structure of the text (headings, paragraphs, lists, ...), inserting links, images etc. In newer browsers there are additionally two more important languages that are used with websites (they can be embedded into the HTML file or come in separate files): [CSS](css.md) which allows specifying the look of the page (e.g. text and font color, background images, position of individual elements etc.) and [JavaScript](js.md) which can be used to embed [scripts](script.md) (small [programs](program.md)) into webpages which will run on the user's computer (in the browser). These languages combined make it possible to make websites do almost anything, even display advanced 3D graphics, play movies etc. However, it's all huge [bloat](bloat.md), it's pretty slow and also dangerous, it was better when webpages used to be HTML only.
The webpages are stored on web [servers](server.md), i.e. computers specialized on listening for requests and sending back requested webpages. If someone wants to create a website, he needs a server to host it on, so called [hosting](hosting.md). This can be done by setting up one's own server -- so called [self hosting](self_hosting.md) -- but nowadays it's more comfortable to buy a hosting service from some company, e.g. a [VPS](vps.md). For running a website you'll also want to buy a web [domain](domain.md) (like `mydomain.com`), i.e. the base part of the textual address of your site (there exist free hosting sites that even come with free domains if you're not picky, just search...).
When a user enters a URL of a page into the browser, the following happens (it's kind of simplified, there are [caches](cache.md) etc.):
1. The [domain](domain.md) name (e.g. `www.mysite.org`) is converted into an [IP](ip.md) address of the server the site is hosted on. This is done by asking a [DNS](dns.md) server -- these are special servers that hold the database mapping domain names to IP addresses (when you buy a domain, you can edit its record in this database to make it point to whatever address you want).
2. The browser sends a request for given page to the IP address of the server. This is done via [HTTP](http.md) (or [HTTPS](https.md) in the encrypted case) protocol (that's the `http://` or `https://` in front of the domain name) -- this protocol is a language via which web servers and clients talk (besides websites it can communicate additional data like passwords entered on the site, [cookies](cookie.md) etc.). (If the encrypted HTTPS protocol is used, encryption is performed with [asymmetric cryptography](asymmetric_cryptography.md) using the server's public key whose digital signature additionally needs to be checked with some certificate authority.) This request is delivered to the server by the mechanisms and lower network layers of the [Internet](internet.md), typically [TCP](tcp.md)/[IP](ip.md).
3. The server receives the request and sends back the webpage embedded again in an [HTTP](http.md) response, along with other data such as the error/success code.
4. Client browser receives the page and displays it. If the page contains additional resources that are needed for displaying the page, such as images, they are automatically retrieved the same way (of course things like [caching](cache.md) may be employed so that they same image doesn't have to be readownloaded literally every time).
[Cookies](cookie.md), small files that sites can store in the user's browser, are used on the web to implement stateful behavior (e.g. remembering if the user is signed in on a forum). However cookies can also be abused for tracking users, so they can be turned off.
Other programming languages such as [PHP](php.md) can also be used on the web, but they are used for server-side programming, i.e. they don't run in the web browser but on the server and somehow generate and modify the sites for each request specifically. This makes it possible to create dynamic pages such as [search engines](search_engine.md) or [social networks](social_network.md).
A great deal of information on the Internet is sadly presented via web pages in favor or normies and disfavor of [hackers](hacking.md) who would like to just download the info without having to do [clickity click on seizure inducing pictures](gui.md) while dodging jumpscare porn [ads](marketing.md). As hackers we aim to write scripts to rape the page and force it to give out its information without us having to suck its dick. With this we acquire the power to automatically archive data, [hoard](data_hoarding.md) it, analyze it, do some [netstalking](netstalking.md), discover hidden gems, make our own search engines, create [lulz](lulz.md) such as spambots etc. For doing just that consider the following tools:
- General [CLI](cli.md) downloaders like [wget](wget.md) and [curl](curl.md). You download the resource and then use normal Unix tools to process it further. Check out the man pages, there exist many options to get around annoying things such as redirects and weirdly formatted URLs.
- Text web browsers like [links](links.md), [lynx](lynx.md) and [w3m](w3m.md) -- these are excellent! Check out especially the `-dump` option. Not only do they handle all the crap like parsing faulty HTML and handling shitty [encryption](encryption.md) [bullshit](bullshit.md), they also nicely render the page as plain text (again allowing further use of standard Unix tools), allow easily filling out forms and all this kind of stuff.
- [Libraries](library.md) and scraping specific tools: there exist many, such as the BeautifulSoup [Python](python.md) library -- although these tools are oftentimes very ugly, you may just abuse them for a one time [throwaway script](throwaway_script.md).
- Do it yourself: if a website is friendly (plain HTTP, no JavaShit, ...) and you just want to do something simple like extract all links, you may well just program your scraper from scratch let's say in [C](c.md), it won't the that hard.