Update
This commit is contained in:
parent
4d545b6845
commit
8b530b5952
20 changed files with 206 additions and 24 deletions
|
@ -26,9 +26,10 @@ Techniques of netstalking include port scanning, randomly generating web domains
|
|||
- **Search archives, file hosting servers etc.** The Internet Archive is the giant among archives that must always be checked, but don't forget smaller ones either, like archive.li, [Usenet](usenet.md) archives, [4chan](4chan.md) archives, various file pastebins etc. You may be able to find stuff that's now gone from the Internet and/or got hidden.
|
||||
- **Guess randomly.** It can even be an entertaining pastime to play a lottery, randomly digging and seeing what you find. For example you can type random domains or IP addresses in your URL bar: `nigger.com`, `hitler.il`, `weirdporn.xyz` or whatever. One can even quite effortlessly bash together a script to automatically check millions of such domains. This has a chance of discovering something that would be otherwise unfindable because it's not linked to from anywhere on the indexed web.
|
||||
- **Manually search unindexable material**. A lot of information is out there but search engines don't know about it because it's not in plaintext format or it's hiding behind a login or captcha wall or whatever. Plenty of stuff is hidden in scanned PDF books, videos, compressed archives, spoken audio etc. Hence when you're searching manually, try to go to places where search engines are less likely to get.
|
||||
- **Write own tools.** Today you no longer have to possess a [PhD](phd.md) (or even brain) to write a simple web scraping script. Custom tools can take you beyond what search engines can (and are willing to) do for you -- for example search engines typically can't search for [regular expressions](regexp.md), but your own crawler can. Your own tool is 100% tailored to your needs, it can behave in exact ways you want (ignore robots.txt, use your credentials to bypass login walls, follow very specific trails, you can even use [OCR](ocr.md) to extract text from images etc.). Like said above, a simple tool is for example one that randomly checks various combinations of words and TLDs to discover curious domain names. Writing a simple crawler is also pretty easy, provided you [keep it very simple](kiss.md) -- exploit existing tools like wget or curl to download pages and extract everything that looks like URL, no need to parse [HTML](html.md) or whatever, literally treat everything as plain text. Then you can extract only documents that are somehow "[interesting](interesting.md)", for example containing specific keywords, not containing JavaScript tags etc.
|
||||
- **Write own tools.** Today you no longer have to possess a [PhD](phd.md) (or even brain) to write a simple web scraping script. Custom tools can take you beyond what search engines can (and are willing to) do for you -- for example search engines typically can't search for [regular expressions](regexp.md), but your own crawler can. Your own tool is 100% tailored to your needs, it can behave in exact ways you want (ignore robots.txt, use your credentials to bypass login walls, follow very specific trails, you can even use [OCR](ocr.md) to extract text from images etc.). Like said above, a simple tool is for example one that randomly checks various combinations of words and TLDs to discover curious domain names. Writing a simple crawler is also pretty easy, provided you [keep it very simple](kiss.md) -- exploit existing tools like wget or curl to download pages and extract everything that looks like URL, no need to parse [HTML](html.md) or whatever, literally treat everything as plain text. Then you can extract only documents that are somehow "[interesting](interesting.md)", for example containing specific keywords, not containing JavaScript tags, only being hosted through plain [HTTP](http.md) etc.
|
||||
- **Find lists of obscure sites and other people who search for them.** A sizable number of small sites now like to post links to other interesting sites, it's enough to find one and then you just start following the links, you find more links etc. This can never end. Some communities like to share lulzy links, e.g. [4chan](4chan.md), kiwifarms, ... Don't forget to contribute back and publish the list of your findings too ;)
|
||||
- **Analyze data.** There are tons of publicly accessible, but yet undigested data about the web -- for example Internet Archive's crawl data, [WikiData](wikidata.md), the Yacy index and so on. You may try your luck sniffing here.
|
||||
- **Get creative.** You may want to try to search for transcripts, logs, weird combinations of phrases such as "[open source](open_source.md)" and "murder", viewing buried sites by skipping the first million search results, exact phrases such as "what's your emergency" can find emergency hotline transcripts, searching a number of lulzy 4chan thread or [hash](hash.md) of a famous shock image may reveal cool sites linking to fun stuff, searching for the name of a file that was part of some source code leak can find sites posting, archiving or analyzing such leaks, searching for sites that together contain the word "[nigger](nigger.md)" in 10 different languages could lead somewhere interesting too, and so on and so forth.
|
||||
- **Be reasonably careful.** Normies get scared shitless to even peek on the darkweb, which is completely ridiculous, just looking and searching publicly available data is practically always 100% legal and even if it wasn't, literally no one gives a single shit. However you might get into trouble if you'd for example reverse search literal child porn, as you're uploading the stuff to someone's server and thus technically distributing CP, putting the server owner in trouble. Still not much would happen probably, maybe you'd get blocked, but you're gonna get yourself on the FBI list. Just use your brain. As long as you're not stepping on someone's toe (doxxing, DDOSing, spamming, ...), no one cares what you're doing.
|
||||
- ...
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue