less_retarded_wiki/netstalking.md
2025-04-30 23:50:45 +02:00

38 lines
11 KiB
Markdown

# Netstalking
Netstalking means searching for obscure, hard-to-find and somehow valuable (even if only by its entertaining nature) [information](information.md)/media buried in the depths of the [Internet](internet.md) (and similar networks), for example searching for funny photos on Google Streetview (https://9-eyes.com/), unindexed [deepweb](deepweb.md) sites or secret documents on [FTP](ftp.md) servers. Netstalking is relatively unknown in the [English](english.md)-speaking world but is pretty popular in Russian communities, although since the beginning of 2020s the general interest in obscure and esoteric material on the Internet seems to have been steadily rising among all inhabitants of the world wide network, perhaps due to other phenomena such as increasing [censorship](censorship.md) (and the desire to bypass it), the "web 1.0 revival" movement etc.
Netstalking can be divided into two categories:
- **deli-search** (deliberate search): trying to find a specific information, e.g. a specific video that got lost.
- **net-random**: randomly searching for interesting information in places where it is likely to be found.
Techniques of netstalking include port scanning, randomly generating web domains, using advanced search queries and different [search engines](search_engine.md), searching caches and archives and obscure networks such as [darknet](darknet.md) or [gopher](gopher.md).
## Pro Tips On Finding Obscure Stuff
- **Use many different search engines.** Make a list of as many engines as you can collect. Mainstream ones ([Google](google.md), Duckduckgo, Bing, Yahoo, Yandex, ...) have huge indices and together cover a large portion of the web, but they're also very [censored](censorship.md), biased and crippled by [SEO](seo.md) competition and AI noise. Meta search engines, like Searx, may help with using many engines at once as well as with discovering new engines (take a look at their settings). Good thing is that engines located in different countries likely censor different stuff, so Google won't find pro-Russian propaganda and Yandex won't find anti-Russian one, so combining them effectively removes this kind of censorship. Without a question you also HAVE TO use smaller, non-commercial and more specialized engines such as [wiby](wiby.md), Marginalia, [Yacy](yacy.md), right dao etc. These are typically less censored (little incentive and/or resources to invest into highly sophisticated censorship), less SEO-infested, usually focused more on the type of material you're after (underground, non-commercial, [small web](smol_internet.md)) and often even offer more advanced features (backlinks, advanced filtering, sometimes even downloading the whole index). Also use specialized search engines, e.g. FTP search engines, PDF search engines, reverse image search engines (Google, tineye, ...) etc. Curated lists of websites, such as Curlie, are also worth giving a try.
- **Know and use advanced search engine options and [hacks](hacking.md).** Ordinarily even mainstream engines support special key phrases that can be inserted into the search query to narrow down the search -- these are crucial for finding real hidden stuff. Sometimes engines even have undocumented options, try to find them (guessing, finding unofficial documentation). Options that typically work in search engines include:
- `"exact phrase"`: Searches only for a verbatim string, very useful e.g. for searching exact filenames and exploiting tricks such as for example searching a long phrase from a publicly inaccessible book to find websites that in fact have such books publicly accessible. Another trick is to search for something like `"powered by gitea"` (or whatever framework) or `"index of"` (common heading of plain file lists) -- this can find small and unadvertised sites running on popular [frameworks](framework.md).
- `before:year`: Limits the search to sites/files published before given year. This is amazingly useful as nowadays everything is just flooded by [AI](ai.md) garbage and commercial, censored [noise](noise.md). Adding `before:2010` just takes you back to the old world where Internet actually contained useful information, where schools for instance weren't afraid to list names of all pupils in each class along with photos, names of their teachers and so on.
- `filetype:type`: Searches only for files of given type. Again, this is very abusable -- you may for example search for Excel spreadsheets (`filetype:xls`), [JSON](json.md) or [CSV](csv.md) databases and so on -- there are tons and tons of sheets with personal information of company employees, taxes and various other sensitive stuff. Searching for MS Word or PowerPoint documents finds files created by people who aren't very skilled with computers and will very likely post some crazy [shit](shit.md) :-) If you're feeling lucky, try to search databases of passwords in plain text.
- **Search non-web networks.** Web is very much controlled and polices now, but other networks are either designed to be uncontrollable and/or are so underground that no one cares to "[moderate](moderation.md)" it. These networks include for example [Tor](tor.md), [I2P](i2p.md), [Freenet](freenet.md), [gopher](gopher.md), [gemini](gemini.md), [WAP](wap.md), [FTP](ftp.md), [Usenet](usenet.md), Guifi (and other wifi networks), [torrents](torrent.md), etc. Also try to search [IRC](irc.md) chat logs and whatever.
- **Search ban lists ("blacklists", "blocklists", "isolation lists", ...).** A trick to finding censored material is to look for a list of the censored stuff -- [FOSS](foss.md) projects (like [Fediverse](fediverse.md)) typically have such lists publicly available as part of their "openness and collaboration".
- **Look for OSINT tools.** OSINT means "open source intelligence", basically digging out info from publicly available sources. This leads to finding amazing tools, for example there exists an AI-powered face search engine that takes a photo of a face and returns images from all over the Internet where that face appears. Works like a charm.
- **Reverse search for obscure/shady/topic related material.** Another cool trick to finding weird sites, or ones related to a very specific topic, is to look for sites that link to already known weird/banned/obscure/topic related stuff. For example searching for sites that link to [Encyclopedia Dramatica](dramatica.md) brings up a promising list of places to check out when looking for uncensored, [SJW](sjw.md)-free places. Similarly you can search for sites that use forbidden words ([nigger](nigger.md), [faggot](faggot.md), ...), images (goatse, gore, FACES of CP stars, ...), very niche terms (e.g. [bitreich](bitreich.md)), "legally problematic" stuff (leaked photos, shooter manifestos, ...) etc.
- **Search in other [languages](human_language.md).** If you're not a native English speaker, you probably know that your country's web contains some cool stuff that's missing from the English web. Due to many factors such as [cultural](culture.md) differences and different political interests (i.e. kinds of censorship and propaganda) some tidbit of trivia will only be found on non-English sites -- Russian, Spanish, Chinese and Japanese websites are a whole new world. Machine translate of the sites is often more than enough to understand the text.
- **Search archives.** The Internet Archive is the giant among archives that must always be checked, but don't forget smaller ones either, like archive.li, [Usenet](usenet.md) archives, [4chan](4chan.md) archives etc. You'll be able to find stuff that's now gone from the Internet and/or got hidden.
- **Guess randomly.** It can even be an entertaining pastime to play a lottery, randomly digging and seeing what you find. For example you can type random domains or IP addresses in your URL bar: `nigger.com`, `hitler.il`, `weirdporn.xyz` or whatever. One can even quite effortlessly bash together a script to automatically check millions of such domains. This has a chance of discovering something that would be otherwise unfindable because it's not linked to from anywhere on the indexed web.
- **Manually search unindexable material**. A lot of information is out there but search engines don't know about it because it's not in plaintext format or it's hiding behind a login or captcha wall or whatever. Plenty of stuff is hidden in scanned PDF books, videos, compressed archives, spoken audio etc. Hence when you're searching manually, try to go to places where search engines are less likely to get.
- **Write own tools.** Today you no longer have to possess a [PhD](phd.md) (or even brain) to write a simple web scraping script. Custom tools can take you beyond what search engines can (and are willing to) do for you -- for example search engines typically can't search for [regular expressions](regexp.md), but your own crawler can. Your own tool is 100% tailored to your needs, it can behave in exact ways you want (ignore robots.txt, use your credentials to bypass login walls, follow very specific trails, you can even use [OCR](ocr.md) to extract text from images etc.). Like said above, a simple tool is for example one that randomly checks various combinations of words and TLDs to discover curious domain names. Writing a simple crawler is also pretty easy, provided you [keep it very simple](kiss.md) -- exploit existing tools like wget or curl to download pages and extract everything that looks like URL, no need to parse [HTML](html.md) or whatever, literally treat everything as plain text. Then you can extract only documents that are somehow "[interesting](interesting.md)", for example containing specific keywords, not containing JavaScript tags etc.
- **Find lists of obscure sites and other people who search for them.** A sizable number of small sites now like to post links to other interesting sites, it's enough to find one and then you just start following the links, you find more links etc. This can never end. Some communities like to share lulzy links, e.g. [4chan](4chan.md), kiwifarms, ... Don't forget to contribute back and publish the list of your findings too ;)
- **Analyze data.** There are tons of publicly accessible, but yet undigested data about the web -- for example Internet Archive's crawl data, [WikiData](wikidata.md), the Yacy index and so on. You may try your luck sniffing here.
- **Be reasonably careful.** Normies get scared shitless to even peek on the darkweb, which is completely ridiculous, just looking and searching publicly available data is practically always 100% legal and even if it wasn't, literally no one gives a single shit. However you might get into trouble if you'd for example reverse search literal child porn, as you're uploading the stuff to someone's server and thus technically distributing CP, putting the server owner in trouble. Still not much would happen probably, maybe you'd get blocked, but you're gonna get yourself on the FBI list. Just use your brain. As long as you're not stepping on someone's toe (doxxing, DDOSing, spamming, ...), no one cares what you're doing.
- ...
## See Also
- [www](www.md)
- [Internet](internet.md)
- [smol internet](smol_internet.md)