less_retarded_wiki/data_hoarding.md

# Data Hoarding

WORK IN PROGRESS

Data hoarding means larger than normal focus on collecting data, often in bigger quantities, most commonly by downloading them from the [Internet](internet.md). Frankly speaking it may or may not be a [disease](disease.md) -- this will depend on whether the activity is done well and contributes to one's well being and increasing [good](good.md) in the world or if it's just a pointless obsession wasting away one's life and enslaving him to the machine. Is data hoarding good? To a certain degree yes, it can achieve a lot of good, for example backing up and mirroring the Internet, helping you prep for a sudden [Internet outage](collapse.md), and of course it may even lead you to digging out interesting things in the process. There is just the danger in it becoming an obsessive disorder, but if you're an addiction prone retard this danger lies basically in any activity at all such as eating, masturbating, drinking alcohol, playing games, smoking etcetc. Just think, use your brain, don't behave like an animal.

There is this famous case of a [woman](woman.md), Marion Stokes, who obsessively recorded TV broadcast on VHS tapes (some 70000 of them) -- her archive nowadays provides a very valuable historical record of footage that would otherwise have been lost. While this is the case of hoarding detrimental to the individual's health, it did help society in the end.

## How To Do It Well

Here let be an advice to the good data hoarder.

- As always: in general **minimize the [bad](bad.md), maximize the [good](good.md)**. Size, cost, [maintenance](maintenance.md), time, anxiety and other trouble are bad. Value of the data, its durability, independence, freedom etc. are good. **Know what and why you're doing it** -- you probably don't want to hoard famous Hollywood movies that you can pirate from 1000 sites at any time, there's no point in that, you want to save files so as to back them up, have them ready in case the Internet stops existing, save something that's likely to be censored and so on. Basically imagine the difference between someone collecting useful objects and someone filling his house up to the roof with complete junk -- the same kind of thing's happening with data.
- **Collect small data of high value**, maximize the value/size ratio: typically save a lot of text data, vector images are fine too, be more picky with bitmap images and things like video always have to be considered extremely well, curated and edited to shrink their size. **Keeping your collection as small as possible is the number one priority** as the size is what makes the difference between having a pocket USB stick that can be quickly and easily backed up on any file sharing website or a CD vs maintaining a [RAID](raid.md) of backups, consuming new CPUs and spending most of your hoarding time just keeping your collection alive.
- **Hoard physical [books](books.md)** and similar oldschool data such as photos, vinyl records -- not only they often have advantages over electronic data, such as being storable without electricity and lasting longer, they are also usually of better quality and higher value. Internet data are full of junk and noise because it's a cheap medium -- a paper book on the other hand has to carefully choose what to include, i.e. it already did this part of the job for you.
- **Use appropriate formats and quality**: if the value of the data is text, save it as txt (even if you found it in pdf), if it's a black and white scan, save it as black and white image (no need for RGB), if it's a diagram, find vector version of it and save that, if it's a meme whose entertaining value will be preserved even at half resolution, don't save it in 1080K resolution, save it at lowest acceptable quality etc. Use simple, common file formats that can be handled by free software, do NOT use proprietary formats or formats that are extremely complicated if you can at all avoid it. Go to great lengths to extract valuable data out of shitty formats: for example if you find a vlog video whose main value is in what's being said and not the video itself, rather find and store the video text transcript than the video itself (it takes much less space, can be searched, indexed, printed and backed up on paper, ...), or, as the next best thing, extract only audio and compress that so that it's just barely understandable.
- **Careful with [compression](compression.md)**: compression can be good but again, only use it when appropriate, in most cases compression will be achieved just by saving the data in good format (and such compression will generally be even better than general purpose compression). General purpose compression (zip etc.) brings in trouble, for example it makes the data more prone to corruption (removes redundancy, increases entropy), it adds a dependency on the decompression program, it makes the files harder to inspect etc. Use it only on very large files that will get reduced a lot, for example some extremely huge dump of text data will likely benefit from being zipped.
- **See how to do [backups](backup.md) well** and stick to that.
- **Use and make tools, automatize**. For example if you're downloading a lot of Wikipedia articles, make a simple script that will extract just the article text, throwing away the unnecessary sidebar, script and styles. Minify all websites you download, remove image tags if you're not saving images etc. Make converting images quicker and simpler e.g. with some ImageMagick scripts. Similarly use ffmpeg to tame your videos. There already exist many web scrapers and format converters and a lot can be achieved with the basic Unix tools, just look stuff up.
- **Organization may be good**: primarily try to name the files well, only use alphanumeric characters and underscore, limit the filename length and adopt some general naming rules. This will help preserve correct names when copying between different systems, and it will make searching more comfortable too. Some general directory structure may be cool, for example separating free and proprietary data will allow you to easily upload the free part anywhere on the Internet and so partially back it up, whereas with proprietary data you might get in trouble. Do not overdo organization though, that may lead to obsessions and wasting time, even complicating the search for something -- [keep it simple](kiss.md). Put some thought into WHY you're organizing the files certain way, don't just do it because it "looks nice", just use your fucking brain.
- **NEVER [ENCRYPT](encryption.md)** for fucks sake, encryption is [shit](shit.md).
- ...

## How To Do It Wrong

Do the opposite of what's described above, download everything just in case, in the highest resolution you can find, develop an adrenaline kick just from the feeling of right clicking a file, buy as many hard drives as you can afford and then fill them up with everything you find, then cry at least whole day if one of them gets corrupted. Then set up an expensive system that will be keeping it all backed up, that will eat up electricity and space and require you to run around it and replace broken disks constantly, clean the dust and keep updating the software that powers it. Encrypt it all with a STRONG password that consists of 1000 absolutely random characters, dedicate 12 hours a day to memorizing this password (you mustn't write it down anywhere) and keep changing the password every month. If you forget the password get depressed and dope yourself with antidepressants so you can keep repeating this. Get attached to your collection like it's your waifu.

## See Also

- [disease](disease.md)
- [backup](backup.md)
- [netstalking](netstalking.md)
Revert "CENSORE" This reverts commit 51c4db334f596501c92af09cc4084991bc0b23be. 2022-09-03 14:13:46 +02:00			`# Data Hoarding`

Update 2024-06-04 21:22:45 +02:00			`WORK IN PROGRESS`

Update 2024-06-08 16:41:14 +02:00			Data hoarding means larger than normal focus on collecting data, often in bigger quantities, most commonly by downloading them from the [Internet](internet.md). Frankly speaking it may or may not be a [disease](disease.md) -- this will depend on whether the activity is done well and contributes to one's well being and increasing [good](good.md) in the world or if it's just a pointless obsession wasting away one's life and enslaving him to the machine. Is data hoarding good? To a certain degree yes, it can achieve a lot of good, for example backing up and mirroring the Internet, helping you prep for a sudden [Internet outage](collapse.md), and of course it may even lead you to digging out interesting things in the process. There is just the danger in it becoming an obsessive disorder, but if you're an addiction prone retard this danger lies basically in any activity at all such as eating, masturbating, drinking alcohol, playing games, smoking etcetc. Just think, use your brain, don't behave like an animal.

			`There is this famous case of a [woman](woman.md), Marion Stokes, who obsessively recorded TV broadcast on VHS tapes (some 70000 of them) -- her archive nowadays provides a very valuable historical record of footage that would otherwise have been lost. While this is the case of hoarding detrimental to the individual's health, it did help society in the end.`
Update 2024-06-04 21:22:45 +02:00
			`## How To Do It Well`

			`Here let be an advice to the good data hoarder.`

Update 2024-06-07 16:46:05 +02:00			- As always: in general minimize the [bad](bad.md), maximize the [good](good.md). Size, cost, [maintenance](maintenance.md), time, anxiety and other trouble are bad. Value of the data, its durability, independence, freedom etc. are good. Know what and why you're doing it -- you probably don't want to hoard famous Hollywood movies that you can pirate from 1000 sites at any time, there's no point in that, you want to save files so as to back them up, have them ready in case the Internet stops existing, save something that's likely to be censored and so on. Basically imagine the difference between someone collecting useful objects and someone filling his house up to the roof with complete junk -- the same kind of thing's happening with data.
Update 2024-06-04 21:22:45 +02:00			- Collect small data of high value, maximize the value/size ratio: typically save a lot of text data, vector images are fine too, be more picky with bitmap images and things like video always have to be considered extremely well, curated and edited to shrink their size. Keeping your collection as small as possible is the number one priority as the size is what makes the difference between having a pocket USB stick that can be quickly and easily backed up on any file sharing website or a CD vs maintaining a [RAID](raid.md) of backups, consuming new CPUs and spending most of your hoarding time just keeping your collection alive.
Update 2024-06-08 16:41:14 +02:00			`- Hoard physical [books](books.md) and similar oldschool data such as photos, vinyl records -- not only they often have advantages over electronic data, such as being storable without electricity and lasting longer, they are also usually of better quality and higher value. Internet data are full of junk and noise because it's a cheap medium -- a paper book on the other hand has to carefully choose what to include, i.e. it already did this part of the job for you.`
Update 2024-06-04 21:22:45 +02:00			- Use appropriate formats and quality: if the value of the data is text, save it as txt (even if you found it in pdf), if it's a black and white scan, save it as black and white image (no need for RGB), if it's a diagram, find vector version of it and save that, if it's a meme whose entertaining value will be preserved even at half resolution, don't save it in 1080K resolution, save it at lowest acceptable quality etc. Use simple, common file formats that can be handled by free software, do NOT use proprietary formats or formats that are extremely complicated if you can at all avoid it. Go to great lengths to extract valuable data out of shitty formats: for example if you find a vlog video whose main value is in what's being said and not the video itself, rather find and store the video text transcript than the video itself (it takes much less space, can be searched, indexed, printed and backed up on paper, ...), or, as the next best thing, extract only audio and compress that so that it's just barely understandable.
			- Careful with [compression](compression.md): compression can be good but again, only use it when appropriate, in most cases compression will be achieved just by saving the data in good format (and such compression will generally be even better than general purpose compression). General purpose compression (zip etc.) brings in trouble, for example it makes the data more prone to corruption (removes redundancy, increases entropy), it adds a dependency on the decompression program, it makes the files harder to inspect etc. Use it only on very large files that will get reduced a lot, for example some extremely huge dump of text data will likely benefit from being zipped.
			`- See how to do [backups](backup.md) well and stick to that.`
			- Use and make tools, automatize. For example if you're downloading a lot of Wikipedia articles, make a simple script that will extract just the article text, throwing away the unnecessary sidebar, script and styles. Minify all websites you download, remove image tags if you're not saving images etc. Make converting images quicker and simpler e.g. with some ImageMagick scripts. Similarly use ffmpeg to tame your videos. There already exist many web scrapers and format converters and a lot can be achieved with the basic Unix tools, just look stuff up.
			- Organization may be good: primarily try to name the files well, only use alphanumeric characters and underscore, limit the filename length and adopt some general naming rules. This will help preserve correct names when copying between different systems, and it will make searching more comfortable too. Some general directory structure may be cool, for example separating free and proprietary data will allow you to easily upload the free part anywhere on the Internet and so partially back it up, whereas with proprietary data you might get in trouble. Do not overdo organization though, that may lead to obsessions and wasting time, even complicating the search for something -- [keep it simple](kiss.md). Put some thought into WHY you're organizing the files certain way, don't just do it because it "looks nice", just use your fucking brain.
Update 2024-06-08 16:41:14 +02:00			`- NEVER [ENCRYPT](encryption.md) for fucks sake, encryption is [shit](shit.md).`
Update 2024-06-04 21:22:45 +02:00			`- ...`

			`## How To Do It Wrong`

Update 2024-06-08 16:41:14 +02:00			Do the opposite of what's described above, download everything just in case, in the highest resolution you can find, develop an adrenaline kick just from the feeling of right clicking a file, buy as many hard drives as you can afford and then fill them up with everything you find, then cry at least whole day if one of them gets corrupted. Then set up an expensive system that will be keeping it all backed up, that will eat up electricity and space and require you to run around it and replace broken disks constantly, clean the dust and keep updating the software that powers it. Encrypt it all with a STRONG password that consists of 1000 absolutely random characters, dedicate 12 hours a day to memorizing this password (you mustn't write it down anywhere) and keep changing the password every month. If you forget the password get depressed and dope yourself with antidepressants so you can keep repeating this. Get attached to your collection like it's your waifu.
Update 2024-06-04 21:22:45 +02:00
			`## See Also`

			`- [disease](disease.md)`
			`- [backup](backup.md)`
			`- [netstalking](netstalking.md)`