Update

2025-05-08 20:41:37 +02:00 · 2025-05-08 20:41:37 +02:00 · 64fd120266
commit 64fd120266
parent 8b530b5952
35 changed files with 2034 additions and 2007 deletions
--- a/netstalking.md
+++ b/netstalking.md
@ -29,6 +29,7 @@ Techniques of netstalking include port scanning, randomly generating web domains
 - **Write own tools.** Today you no longer have to possess a [PhD](phd.md) (or even brain) to write a simple web scraping script. Custom tools can take you beyond what search engines can (and are willing to) do for you -- for example search engines typically can't search for [regular expressions](regexp.md), but your own crawler can. Your own tool is 100% tailored to your needs, it can behave in exact ways you want (ignore robots.txt, use your credentials to bypass login walls, follow very specific trails, you can even use [OCR](ocr.md) to extract text from images etc.). Like said above, a simple tool is for example one that randomly checks various combinations of words and TLDs to discover curious domain names. Writing a simple crawler is also pretty easy, provided you [keep it very simple](kiss.md) -- exploit existing tools like wget or curl to download pages and extract everything that looks like URL, no need to parse [HTML](html.md) or whatever, literally treat everything as plain text. Then you can extract only documents that are somehow "[interesting](interesting.md)", for example containing specific keywords, not containing JavaScript tags, only being hosted through plain [HTTP](http.md) etc.
 - **Find lists of obscure sites and other people who search for them.** A sizable number of small sites now like to post links to other interesting sites, it's enough to find one and then you just start following the links, you find more links etc. This can never end. Some communities like to share lulzy links, e.g. [4chan](4chan.md), kiwifarms, ... Don't forget to contribute back and publish the list of your findings too ;)
 - **Analyze data.** There are tons of publicly accessible, but yet undigested data about the web -- for example Internet Archive's crawl data, [WikiData](wikidata.md), the Yacy index and so on. You may try your luck sniffing here.
+- **Filtering**: today the issue of finding something of value has turned from discovering paths to rather filtering out all the countless surrounding [noise](noise.md). There is so much data we get lost in it, so the focus shifts to clever filtering. For example on YouTube all the weird, cool videos are accessible, they're just buried and the algorithm never recommends them, the search never finds them. A way to get to quality videos is for example searching older videos (`before:2015`) which also have subtitles (this is usually a sign of high quality videos, no one bothers with subtitles on crappy videos).
 - **Get creative.** You may want to try to search for transcripts, logs, weird combinations of phrases such as "[open source](open_source.md)" and "murder", viewing buried sites by skipping the first million search results, exact phrases such as "what's your emergency" can find emergency hotline transcripts, searching a number of lulzy 4chan thread or [hash](hash.md) of a famous shock image may reveal cool sites linking to fun stuff, searching for the name of a file that was part of some source code leak can find sites posting, archiving or analyzing such leaks, searching for sites that together contain the word "[nigger](nigger.md)" in 10 different languages could lead somewhere interesting too, and so on and so forth.
 - **Be reasonably careful.** Normies get scared shitless to even peek on the darkweb, which is completely ridiculous, just looking and searching publicly available data is practically always 100% legal and even if it wasn't, literally no one gives a single shit. However you might get into trouble if you'd for example reverse search literal child porn, as you're uploading the stuff to someone's server and thus technically distributing CP, putting the server owner in trouble. Still not much would happen probably, maybe you'd get blocked, but you're gonna get yourself on the FBI list. Just use your brain. As long as you're not stepping on someone's toe (doxxing, DDOSing, spamming, ...), no one cares what you're doing.
 - ...