/g/ - Technology

install openbsd

[Make a Post]
[X]





/drg/ Data Retention General Nanonymous No.5235 [D][U][F][S][L][A][C] >>5247 >>5255
File: b61d30af587ab582a9d84d43d2130bc199a4522c6cf479146bdb0f776cdbe363.jpg (dl) (40.61 KiB)
Between the current normiefication of the internet, governments regulating content online more than even and websites getting shut down all the time i finally made the choice to get serious about data hoarding and data retention, my objective is to collect at least 100TB of data that i care about to tramandate it to future generations when the full cyberpunk future i see on the horizon will become reality. In this thread i want to focus on the retention of this data, in the past i often overlooked this aspect and i lost stuff multiple times, i'm don't want to make that mistake ever again, so if you could share your data retention strategies it would help me greatly, feel free to also recommend particular brands or software or general tips.

Nanonymous No.5245 [D] >>5247 >>5251
Let's start with the capacity. 100TB is a lot so question is can you afford it? If the answer is yes, then how do you plan to insert it ie do you have homeserver, will dump things to drive and use another one? Well with 8TB drives you would need 13 discs, I guess you could insert it into new computer (or server). Keep in mind potential failure that can happen. The best thing to archive stuff would be using decentralized protocol like bittorrent and encouraging people to keep it alive, for example dumping terabytes of music. This way you could rest assured if shit hits the fan someone will have a copy (though downloading terabytes could take ages without fast connection. I wouldn't depend on cloud because they can go down anytime, remove your stuff and you have to pay monthly/yearly. Then there is archive.org happy to keep stuff alive but using it for private files or something that can't possibly be uploaded (I don't think they're fine with modern movies even with their DMCA exemption). Last but not least, archive nice things. It would be waste to spend it on porn when so many movies or dumps are at the risk to go extinct. Maybe do you plan to go 50/50 to have two copies? Anyway, make sure for really scarce things to have on different drives.

Nanonymous No.5247 [D] >>5249 >>5250 >>5251 >>5255 >>5258
>>5235
>Storage
See >>5245, also if your intention is keep the data safe you should choose a more secure technology, HDs are weak, go for SSDs.

>Data
Keep files organized is very important, a modern effective tag system is a must.

>Content
What kind of content would it handle? Does this has something to do with sadpanda's death? Illegal/pirated content must be properly encrypted.

>Software
I would program my own.

Nanonymous No.5248 [D] >>5251
>100TB
LMOA ur reterded, u will never use all this data ever.

Nanonymous No.5249 [D] >>5252
>>5247
>HD is weaker than ssd
haven't heard something so idiotic in a while. When talking about large data storage then an ssd is worse on every metric. an ssd is only used as a caching layer and/or highly volatile data. its only rendeeming metric in a datacenter is that its less expensive than RAM. With 100-200MB/s just 12 hdd's will already max out a 10gbit connection. not to mention the data retention differences between hdd and ssd. Only reason ssd's are in laptops is because people actually USE their hard drive as a fast volatile device. Downloading and storing 10GiB quickly is their usecase.

Nanonymous No.5250 [D] >>5252 >>5253 >>5256
100TB is too much OP. But, if you really are going to do it, you should do a RAID 5 using ZFS or HAMMER2 from DragonflyBSD. This would give you some safety regarding the data integrity:
https://www.dragonflybsd.org/docs/docs/howtos/howtosoftwareraid/

See also Tahoe-LAFS:
https://en.wikipedia.org/wiki/Tahoe-LAFS

Don't buy Seagate HDDs, they are trash. Use WesternDigital instead.

You would also need a fast internet connection for that. If you start to download 20TB every month your ISP will probably complain, so you would need a optical fiber connection.

If the content you're downloading is not legal (pirated movies/anime), you will need a proxy or else authorities will fuck you up. But, since the amount data you're transfering is so big, you will need a dedicated server with optical fiber connection too. Set up a OpenIKED daemon and make your requests through it.
For torrent, I suggest you use aria2c to take advantage of the high-bandwidth connection.

If you're going to be doing that in the same place you use internet, you need a badass router with FQ-CoDel or else the whole connection will be fucked:
https://pauladamsmith.com/blog/2018/07/fixing-bufferbloat-on-your-home-network-with-openbsd-6.2-or-newer.html

>>5247
>more secure technology
>go for SSDs
Are you kidding?

OP No.5251 [D] >>5253 >>5254 >>5255 >>5258
Thanks for the replies.
>>5245
>Let's start with the capacity. 100TB is a lot so question is can you afford it?
It is a medium term project, if i save money every month i should be able to buy 20 10TB disks(10 active, 10 backup) or the equivalent number of 8TB disks in about a couple years, i plan to buy one every month or more if i can spare more money.
>Well with 8TB drives you would need 13 discs, I guess you could insert it into new computer (or server)
Yes i will definitely need to get a server the more disks i add, my current motherboard has 6 SATA ports, if some sysadmin nanon could give some advice on some server motherboards it would help in the future.
>Anyway, make sure for really scarce things to have on different drives.
I plan on keeping an extra 10TB external drive with an extra copy of important stuff, just in case and for everyday use.
>>5247
>if your intention is keep the data safe you should choose a more secure technology, HDs are weak, go for SSDs.
Using SSDs instead of HDD would cost me at least 8 times more and it would also mean a lot more disks, remember we are talking about 100TB, maybe i'm missing something?
>Keep files organized is very important, a modern effective tag system is a must.
I was thinking of using a tag system too, is there any tagging software that supports scripting(is command-line) and linux/unix? I'm struggling to find something decent maybe i'm looking in the wrong places. If i can't find anything i'll probably handle it with a database or something like that.
>Illegal/pirated content must be properly encrypted.
There is gonna be a lot of copyrighted stuff, so of course it's gonna be encrypted, was thinking of using veracrypt.
>What kind of content would it handle? Does this has something to do with sadpanda's death?
It would contain a bit of everything, all the media i care about, entire backups of websites, imageboards archives, the medias will take most space anyway. It's not directly related to sadpanda, but that is the last event in a series that convinced me that this is necessary.
>>5248
>LMOA ur reterded, u will never use all this data ever.
I've already accumulated almost 10TB on my current HDDs and i still have all kind of stuff to archuve and was never serious about data hoarding before, i think 100TB is appropriate. The point is not was i will use, the point is to prevent the loss of information, this could be useful to others in the future not only to me.


Talking about backups, i should keep them in a separate location from the server, is a safety deposit box in a bank or something fine? backups are also gonna be encrypted of course. And are those cloud backup services that promises unlimited space truly unlimited?

Nanonymous No.5252 [D] >>5253 >>5254 >>5256 >>5259
>>5249
>>5250
That's right, but OP said that he want to keep the data for a really long time (full cyberpunk future), an HD easily breaks if it falls down. I should have used 'stronger' instead of 'secure'.

Nanonymous No.5253 [D] >>5254 >>5256
>>5250 (me)
>>5251
>cloud backup services
You have no idea what you're doing, do you? You can't backup 100TB that easily to the "cloud" (whatever that means).
>>5252
Not sure about that. Here:
>In summary, we find that the flash drives in our study experience significantly lower replacement rates (within their rated lifetime) than hard disk drives. On the downside, they experience significantly higher rates of uncorrectable errors than hard disk drives.
>More than 20% of flash drives develop uncorrectable errors in a four year period, 30-80% develop bad blocks and 2-7% of them develop bad chips. In comparison, previous work on HDDs reports that only 3.5% of disks in a large population developed bad sectors in a 32 months period
https://www.usenix.org/conference/fast16/technical-sessions/presentation/schroeder

Nanonymous No.5254 [D] >>5256
>>5251
>maybe i'm missing something?
Here >>5252, there are other storage devices that might be interesting for you like the M Disk or tape storage. Another point about storage is that many content might be updated.

This might be useful https://nimbusdata.com/products/exadrive-platform/comparison/

>is there any tagging software that supports scripting(is command-line) and linux/unix?
I usually set the tags on file names and use regex to find what I want.

Remember, if you really want this to be useful to a cyberpunk future you need to keep things simple, the data must be accessible without present tools, in future they will be deprecated and too obscure. Create your own tools for this and keep an manual for futurefags be able to reach the data.

>>5253
Considering that then M Disk is the only option for OP. Unless he meant something different when with "cyberpunk future".

Nanonymous No.5255 [D]
>>5235
>100TB
>long term archiving
Use tapes. Make duplicates and keep in different locations.
>>5247
>ssd > hdd
Assuming this isn't bait, ssd's need to be powered on occasionally for the firmware to refresh the bits and prevent errors. They also have exponentially decreasing retention rates proportional to surrounding temperature. They are also three times more expensive per gb comapred to hard disks.
>>5251
The freenas community likes using supermicro boards for their nas servers, you could probably use a consumer motherboard with ecc support and get pcie to sas converters

Nanonymous No.5256 [D]
>>5250
>RAID 5
I'll think about RAID when i have enough disks.
>Tahoe-LAFS
Interesting, but i would prefer something completely offline for now.
>Don't buy Seagate HDDs, they are trash. Use WesternDigital instead.
I always used WD until now and never had a problem, probably will go with that. What do you think about Toshiba?
>You would also need a fast internet connection for that.
For reference i have about 6MB/s of download bandwidth so that means about 15TB-16TB per month in theory.
>you will need a proxy or else authorities will fuck you up
All i do on the internet goes though a VPN. As a sidenote authorities in my country barely know how to use a PC lul.
>Bufferbloat
Never experienced it, but i will keep it in mind, thanks.
>>5252
>OP said that he want to keep the data for a really long time (full cyberpunk future), an HD easily breaks if it falls down.
I will try to not drop the disks ;)
If SSD were viable price wise i would be happy to use them, but it's not possible. Since i intend to keep a separate array of disks as a backup and RAID with parity there is gonna be protection against drive failures.
>>5253
>You have no idea what you're doing, do you? You can't backup 100TB that easily to the "cloud" (whatever that means).
I think you misunderstood. Of course i wasn't talking of uploading all the 100TB although maybe it seems like my wording implied it, with my upload bandwidth it would take LITERALLY years, i was thinking of using them to backup the most important stuff. I had in mind something like carbonite or backblaze but after looking into both they're shit so nevermind.
>>5254
>M Disk or tape storage
I would still need to use or modify/update some of the data from time to time.
>100TB SSD
I wonder how much does that costs lul. Definitely out of my league.
>I usually set the tags on file names and use regex to find what I want.
Looks like an easy and convenient solution.
>if you really want this to be useful to a cyberpunk future you need to keep things simple, the data must be accessible without present tools
Good point, well i will try to keep it usable with the years. Of course i won't use proprietary or closed source software, as long as i have source and hardware supports it should be ok.
>Unless he meant something different when with "cyberpunk future".
I meant 25+ years from now, sorry for being cryptic.

Nanonymous No.5257 [D] >>5294
Use ZFS, it's better than raid at higher disk counts. https://wintelguy.com/zfs-calc.pl can help you find out how many disks you need (17 8TB drives at raid-z2)

Nanonymous No.5258 [D] >>5294
>>5247
>Illegal/pirated content must be properly encrypted
I would say to only encrypt dangerous stuff, like software. Movies, music or books won't bring a heat and if drive gets corrupted it will be easier to recover data.

>>5251
>20 10TB disks(10 active, 10 backup)
That's even more. Good you consider backup.

>my current motherboard has 6 SATA ports
There are motherboards with 18 SATA ports so if you have money for 200TB it won't be much I guess https://www.geek.com/chips/new-asrock-motherboard-has-18-sata-ports-so-you-can-become-a-storage-god-1606552/
Keep in mind you don't need to have all of them connected at the same time.

>I plan on keeping an extra 10TB external drive with an extra copy of important stuff, just in case and for everyday use
Excellent.

>an HD easily breaks if it falls down
Are you implying SSDs are safer? Beside OP said it would cost more.

Nanonymous No.5259 [D] >>5294
>>5252
I know what you meant. an ssd is even worse for that than you realise. It stores data though reading and writing voltages to a cell. If you dont give it power and/or keep it in a room at 40-60 celcius then data will start to corrupt within 2 years because of voltage leaks in those tiny cells. this is a fundamental principle of the tech. smaller cells will only make it worse. HDD stores bits with magnetic polarisation. This is subjected to laws of physics that govern the loss of magnetism. which, AFAIK, is in the range of millions/billions of years. even extreme faillures can be recovered 'trivially' by taking out the platters in a clean room and reading the entire disk. there are plenty companies specialised in this. The platters are also nearly indestructible and uncorrosive. try removing a piece of sillicon with 128GiB of data from a broken board and reading almost dissapated voltage levels of all cells of a piece of sillicon. Good luck on your journey down the rabbit hole.

Nanonymous No.5261 [D] >>5294
>ZFS meme
If you're on Linux, just use dm-integrity and mdraid with XFS. You get the advantages of ZFS without the mega monolithic bloat, the CoC and the unstability.
>Seagate shit WD good
True until you stop buying consumer drives. Then Seagate's Exos are very good while WD is overpriced as fuck. I'd buy HGST if I wanted to spend a lot (inb4 WD bought HGST, I doubt HGST was totally consumed). I've had good experience with Toshiba drives too; they were the consumer division of HGST, I think.
>SSDs
Only a brainlet could recommend these as MLC is getting rarer and more expensive every day. Now you can only get TLC or even QLC garbage.
Crucial/Micron's M500/M550 was the best SSD there was, too bad they replaced it with the BX500/MX500.

Nanonymous No.5265 [D] >>5294
GET AN EXPLODING, FULLY ENCRYPTED HARD DRIVE RUNNING 9ATOM OR DRAGONFLYBSD OR SOMETHING PROBLEM SOLVED REEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

OP No.5294 [D] >>5296 >>5321 >>5348
>>5257
>ZFS
It seems really appropriate with my use case, i never used it so i will need to study it first, is RAIDZ1 really slow in read?
>>5258
>I would say to only encrypt dangerous stuff, like software.
I will encrypt everything, you never know.
>There are motherboards with 18 SATA ports
ASRock has interesting stuff, i think this problem is solved, there is still the problem of where to put all the drives, some cases that has lot of drive space? Or do i need to go for server enclosures?
>Keep in mind you don't need to have all of them connected at the same time.
Uhmm i think it is necessary with RAID right?
>>5259
Usually they say that lifetime of HDD is about 4 years, but this is continued use, if i don't use them once filled as you said they should last for a long period of time. I am safe for my 25+ years in theory right? There is also gonna be the backup stack that is gonna be used even less(i was thinking once every 3 months is updates are needed).
>>5261
>True until you stop buying consumer drives. Then Seagate's Exos are very good while WD is overpriced as fuck.
You are right, WD Gold Enterprise disks are really expensive, they costs about double compared to others at same capacity, i can't afford that so they're out i guess.
What remains is:
WD Blue 6TB This means more disks cause the capacity is lower, more disks means it is more probable that multiple drives fail toghever, it is also a consumer grade disk, i had good experience with WD Blue in the past but it doesn't convince me, 5400 rpm.
WD Red 10TB Is a NAS disk, but it will work fine as storage as far as i know, cost is good availability is good, 5400 rpm.
WD Purple 10TB Is a surveillance disk, if i am correct surveillance disks have bettere write rates which is interesting for me, since i am gonna write a lot and not read that much, cost is good availability is good, 7200 rpm.
Seagate Exos Enterprise It seems overally really good, thanks for recommending it, compared to the WD Gold Enterprise price is way lower(half), i would like a second opinion though, cost is good availability is good, 7200 rpm.
Between these choices the WD Purple and the Seagate Exos Enterprise seems the best, opinions?
>>5265
Instead of shitposting why you don't join the discussion [asukafag]?

Nanonymous No.5296 [D][U][F] >>5311
File: c754f393238f9ec9723bd72fcd6d006690c628ba39ba1a13552a93934bfb1ee7.jpg (dl) (160.96 KiB)
>>5294
>I will encrypt everything, you never know
Well, if they ever delegalize movies or music then it would be pain in the ass to move files into new encrypted drive, you're right.

>some cases that has lot of drive space?
If you want it home then shelves like pic rel exist or just DIY if you're capable. With something like this 10 drives won't be much.

>Uhmm i think it is necessary with RAID right?
I thought all discs would be independent but it makes sense with encryption. Isn't it with RAID that if one drive breaks you're fucked?

Nanonymous No.5311 [D]
>>5296
>shelves
Seems like it could fall and break, i will go for a server case if i can't find big enough consumer cases.
>Isn't it with RAID that if one drive breaks you're fucked?
RAID5 and RAID6 has distributed parity against drive failures, or RAID50 and RAID60
https://en.wikipedia.org/wiki/Standard_RAID_levels
https://en.wikipedia.org/wiki/Nested_RAID_levels

Nanonymous No.5321 [D]
>>5294

>asukafag
What is an asukafag?

>Exploding SSDs are a real thing.

https://security.stackexchange.com/questions/37750/does-a-commercially-available-self-destructing-hdd-set-in-bios-exist
http://www.runcore.co/en/RC-SSDProduct-45-1.html

So are self-encrypting HDs.

Not sure how any of that would work with 9atom though, a notoriously unsupported OS.

DragonFlyBSD would likely work though.

Nanonymous No.5322 [D] >>5352
/doomer/ is not blackpilled, nanochan is

Nanonymous No.5348 [D] >>5352
>>5294
>more disks more probable to fail
No, higher capacity means longer rebuild times, and higher likelihood of drives failing during the rebuild and you losing all your data. 4TB per drive is recommended, 8TB is where you're pushing it.
>wd purple
That drive is for consistent 24/7 writes and overwrites, you won't be doing that if you're archiving. You don't need enterprise disks and the money saved can be used for more meaningful additions, but if you still want it go for it.

OP No.5352 [D] >>5353 >>5427
So current setup is:

>1 10TB external HDD for an extra backup of the important stuff and for everyday use, i was thinking of using the btrfs filesystem on this drive and using this driver for compatibility with windows https://github.com/maharmstone/btrfs (i dual boot), opinions on this? Cost [200$~]
<10 Seagate Exos Enterprise 10TB HDDs in RAIDZ1 (equivalent of RAID5) for the main stack, with ZFS filesystem. Cost [300$~ * 10 = 3000$]
>10 Seagate Exos Enterprise 10TB HDDs in RAIDZ1 (equivalent of RAID5) for the backup stack, with ZFS filesystem. Cost [300$~ * 10 = 3000$]
<Server chassis, for the active stack, with easy access to disk in case i need to change failed ones. Cost [150$~]
>Another chassis for the backup disks, probably gonna build one myself or 3D print, just to keep them safe from being hurt. Cost [0$] (let's say 0$ for now)
<Generic (but quality) motherboard with at least 10 SATA ports. Cost [250$~]
>Generic CPU, RAM, SSD for the OS, PSU, UPS to avoid data corruption from power loss, PCI to SATA adapter card with 10 SATA ports to connect the backup stack for updates. Cost [from 400$~ to 500$~] (staying on the cheap side here)

TOTAL: 7100$~

I should be able to complete it in 1 to 2 years depending on how much i can save.
I'm avoiding posting the exact brands and models to make my build less identifiable, but feel free to recommend specific models if you want.

A couple of notes:

>I will buy the disk at a rate of 1 or 2 per month, main reason(beside money) is that this way there are less probabilities that the disks fail all together, compared to buying disks that were manufactured and sold in a really close range of time.
<I will keep the backup stack in a different location although i still have to figure out the logistics of that.
>Before anybody spergs out when i said i dual boot with windows, i meant on my main system, the OS of the server is gonna be just a Linux distro(again not specifying for anonimity but feel free to recommend if you think one is particularly good for the job).
<Both the active stack and the backup stack are gonna be encrypted with veracrypt.
>I will also encrypt the OS if it doesn't add too much complexity(i usually encrypt my OS partition).
<The backup stack is gonna be syncronized every time i add a disk and then when all are filled every 3 months if there are updates to be made.

This build is not definitive of course, i will consider even major changes if they are reasonable and motivated.
I also accept proposals on what content to archive.



>>5322
I think my worries are realistic and backed up by facts.
>>5348
>No, higher capacity means longer rebuild times
You're not wrong about rebuild, but if i have lot of drives there are more probabilities that 2 fail toghether, this is a fact it's just statistics. Anyway in case of a multiple drive failures or a failure during rebuild, i could restore from the backup stack so data loss will be minimezed.
>That drive is for consistent 24/7 writes and overwrites, you won't be doing that if you're archiving.
Yeah you're probably right.
>You don't need enterprise disks and the money saved can be used for more meaningful additions
The Seagate Exos Enterprise costs about the same as non enterprise 10TB disks, so in the end i will go with that.

Nanonymous No.5353 [D] >>5381
>>5352
>Generic motherboard
See:
https://store.vikings.net/libre-friendly-hardware/d16-ryf-certfied

>encrypted with veracrypt
Use LUKS instead. Can be decrypted on windows using LibreCrypt:
https://github.com/t-d-k/LibreCrypt

>so in the end i will go with that
I don't recommend that. I didn't use this "Exos Enterprise", but my experience with Seagate is terrible. If have had about 6 HDDs from them in the last 10 years, all of them are broken now. The ones from WD are still working.
After realising this pattern, I don't buy Seagate products anymore, same way I don't buy Intel too.


Ok, so lets say you got your ~7100 dollars build. What will you download? Don't post anything that might compromise your anonymity, but I was just curious about how are you going to fill these 100TB.

Nanonymous No.5357 [D] >>5381
>5352
>if i have lot of drives there are more probabilities that 2 fail toghether, this is a fact it's just statistics
The statistics show that you're more likely to have your drives break when it's being stressed during rebuilds that take hours, than the increase in probability that two drives fail together that you would get from an increased number of drives. Think of having more drives as having more total disk I/O. Also, go for raidz2 and higher, especially if you aren't also mirroring your data somewhere else.

Nanonymous No.5360 [D][U][F] >>5381
File: e6a14b4df67c8a3eec763dd206d2525c11fb95396ccb9b3f4e6ec9a24e7209bb.jpg (dl) (335.52 KiB)
I think the whole idea is braindamaged but whatever floats your boat man.
I mostly aim to reduce my data hoards even when I have space. Most of the data out there, especially heavy, like videos and software packages is rather useless per se or will be ultimately useless in "25+ years".
And even if you aim at something admirable, like saving some huge libraries of books, a lot of them, like technical literature, will be useless too, and it's going to be wasted on you if you don't share it somehow anyway, so better buy that fat Internet channel as well LMAO.
Though ultimately I think all you could possibly need would fit in 1TB. And all you really really need would fit in 100GB, probably.

Now, to be somewhat useful to you and don't shit on your thread, I got something to say on the subject.
I remember reading some report from a big datacenter that ultimately says hard drives are kinda weird. They don't really live "4 years on average". What they do is some flawed part of them consistently break after about 1-3 years of use, and the rest just goes on essentially forever, like 10+ years, as long as their resource is not used up. That report was about active use, of course, but I think it's important to note that essentially they break not because of use, but, as I said, some flawed shit just cannot go on longer than 3 years. Maybe that's why second hand drives might be more reliable, but with them you might get failures for other reasons.

Nanonymous No.5362 [D][U][F] >>5381
File: 3eaad115d78e3de9c294c12690590a0fef63399946dc74ea7ab6b8b07f577e45.jpg (dl) (255.54 KiB)
>It would contain a bit of everything, all the media i care about, entire backups of websites, imageboards archives, the medias will take most space anyway
Oh God, talk about useless.
You will never even know what's in the 100TB of archives, let alone read them, and reading random talks by peoples is only situationally useful, and you probably are not going to be in that situation in 25+ years as a matter of fact.

Anyway don't cry 'cause it's over. Smile 'cause it happened. ;)

This shall be known as the "Useless Archive" OP No.5381 [D][U][F] >>5424 >>5460
File: 35e5575a4961f77a4c14c36d2558af44dd6d81cc3f9576471d125c380cb2fea7.gif (dl) (1.31 MiB)
>>5353
>I don't recommend that. I didn't use this "Exos Enterprise", but my experience with Seagate is terrible.
Problem is that WD equivalent enterprise disks like the Gold series(which is discontinued i guess?), HGST(also discontinued) and the ultrastar more expensive and difficult to find. The Exos has a MTBF of 2.5 million hours, which is same as the Ultrastar and pretty much same specifications, while at the same time being cheaper and easier to find. Compare the data sheets:
Ultrastar https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/general-docs/data-sheet-ultrastar-sata-series-2879-810017.pdf
Exos https://www.seagate.com/files/www-content/datasheets/pdfs/exos-x-10DS1948-1-1709US-en_US.pdf
With this said, my preference is based only on current prices that i'm looking at right now. I won't consider their discontinued brands, but if i find the ultrastar at a good price it would be nice. BTW is it a bad idea to mix different HDD models in a RAID array? Since they have such similar specs it should not cause bottlenecks, i'm thinking about mixing ultrastar and exos.
Anyway if you disagree provide an alternative.
>>5357
>The statistics show that you're more likely to have your drives break when it's being stressed during rebuilds
It's not like it don't trust you but can you link some sources?
So how do you suggest to change my setup? 17(active) + 17(backup) = 34 6TB HDDs? That is a lot of disks!
>Also, go for raidz2 and higher, especially if you aren't also mirroring your data somewhere else.
As i said in other posts i will setup a mirrored backup.
The ZFS RAID calculator gives me
88.44TB of usable space with RAIDZ1
75.81TB of usable space with RAIDZ2
66.33TB of usable space with RAIDZ3
i lose 13TB or the equivalent of 400$ to switch from RAIDZ1 to RAIDZ2, i don't think it's worth it considering i have a backup.
>Ok, so lets say you got your ~7100 dollars build. What will you download?
Lossless music, cracked/drm free vidya, movies/anime possibly in bluray quality, comics/manga at the highest definition i can find, porn/hentai cause sadpanda will be avenged, all literature i can find (libgen/scihub), open source software/OS with sources, old software/abandonware, cracked proprietary software since it gets taken down often, retro vidya roms since they are under attack, vidya mods, all osu beatmaps cause sometime they are taken down for copyright, imageboards archives, archives of various other websites(this depends if i can get my head around crawlers and stuff and would deserve its own thread), scientific data like photos/videos from space/nature, 3D models for 3D printing, endangered content in general like the new zeland shooting video. I probably forgot some stuff, i should make a better list.
The main purpose of this project is to avoid the loss of information in general, this means that i need to avoid lossy compression as much as i can, so i will try to always go for the original/uncompressed/losslessly compressed versions.
I accept ideas on what else to archive!
>>5360
>>5362
>hard drives are kinda weird
Can you link the report source?
Having parity+backup will help against possible manifacture flaws, can't do more than that.
>it's going to be wasted on you if you don't share it somehow anyway
I will share it of course i thougt it was obvious, well not the whole 100TB cause it's impossible, but if stuff gets taken down or becomes hard to get for any reasons i will, silently, without attracting heat, put it back online with p2p networks and/or pass it to public archives services.
>Most of the data out there, especially heavy, like videos and software packages is rather useless per se or will be ultimately useless in "25+ years".
Videos in bluray quality probably won't become obsolete even in 25 years, keeping the software, in particular the code, is really important so that it will be possible to recreate "old" systems in the future. For all we know 25 years from now everything could be in the cloud, and you won't have access to software or files in general.
>Though ultimately I think all you could possibly need would fit in 1TB. And all you really really need would fit in 100GB, probably.
>Oh God, talk about useless.
Technically you don't "need" a computer or data to survive in the first place, so all you need is 0bits of storage space, people lived that way for millennia, "need" is relative to your situation, for example a man that is dying of thirst in the desert needs a cup of water more than somebody swimming in a lake, in the situation of the modern information age data is needed in a way and just as the cup of water in my example gains value if there is scarcity of water, data gains value is there is scarcity of data or if the data is difficult to access(DRM, cloud only, etc), what i'm trying to get to is that stuff that you think is not important or needed now, may gain importance and become valuable if it is banned or made difficult to access. Another aspect is the historical one, you would think that in the information age it would be impossible to hide or make information disappear right? And yet look at china, they are really good at modifying history, this is part of why i want to archive stuff like imageboards and other websites.
>You will never even know what's in the 100TB of archives
Keyword/regex search is a thing lul.
>let alone read them
I read imageboard archives all the time, it's like time travelling, i would really like to have early 4chan archives but it seems they all got lost or never existed.

Nanonymous No.5424 [D][U][F] >>5460
File: bf2b97fbdbdc25657cbab79f96d647dd58292bca1e2ad4912520e0dd26e33cf9.jpg (dl) (304.15 KiB)
>>5381
>Can you link the report source?
No, but I think you don't need it.
You see, if that is how HDDs behave in real life, any similar report would state the same facts. If not, well, maybe that particular report was some aberration.
>silently, without attracting heat, put it back online
This seems like wishful thinking. There is no way you become vague useful to the public and don't attract public attention.
I mean, not particularly you, as you are to be anonymous, but your uploads and distribution channels.
>Videos in bluray quality probably won't become obsolete
What I was getting at is that they are garbage. Fucking digital entertainment is what they are. They are widely distributed because people want them, not really because people keep archives of the shit. Of course, the pirate scene works, but they are the means of getting the content out there in the first place.
>keeping the software, in particular the code, is really important so that it will be possible to recreate "old" systems in the future.
What you need is the actual old systems (hardware) and documentation for programmers, like Intel manuals, ARM manuals etc. Old dead code that nobody knows how works is a really bad idea to revive.
Also you would have the genuinity confirmation problem. MSDN images were confirmed genuine by cryptographic hash checksums, and now MSDN checksums are not available publicly anymore (at the very least), so there is no reason to trust any of the MSDN images out there. Any lists of checksums preserved from disappearing might be tampered with. There is no reason to trust your archive in particular. Only users who can confirm that your data is genuine would be able to make use of your archive.
Like, it sounds tinfoil, but it all depends on the threat level.
>Technically you don't "need" a computer or data to survive in the first place
What I'm getting at is that you need a computer to do your work, not to be a library for everyone in the world. And there is only so much you can read in the first place. Like, there is NO way in hell you read even 1TB of imageboard archives, possibly in a lifetime.
>data gains value is there is scarcity of data or if the data is difficult to access
This is horseshit. Data gains value through the utility only. Archaeology may be a hobby, but not a necessity.
>you would think that in the information age it would be impossible to hide or make information disappear right?
If information causes a public uproar, it's virtually impossible.
If it's some talks from the Internet, a lot of them have disappeared already, and you don't even know they existed. And nobody cares. You don't care about some posts disappearing from Nanochan, and not all of them were spam, mind you.
This is a fucking joke.
>Keyword/regex search is a thing lul.
Enter some search pattern on a 1TB archive. The results will be probably hundreds of megabytes of matches. It is unreadable.
And anyway you have to know what to look for. Moot point entirely.
>I read imageboard archives all the time
Well, you are probably braindamaged, sorry.
I, on the other hand, write them. xD

Anyway, I don't really seek to prevent you from hoarding a lot of useless shit here. I'm just expressing my opinion. Like maybe 1% of your data will be actually useful and that's fine. Worth, maybe. I just think it would be more effective to not just upload all the junk you see on the Interwebz.

Nanonymous No.5427 [D] >>5460
>>5352
>I think my worries are realistic and backed up by facts.
The problem is that all think the same like you, which means you are probably true, but it's completely flawed.

Nanonymous No.5460 [D] >>5462
>>5381
>(this depends if i can get my head around crawlers and stuff and would deserve its own thread)
I meant scrapers not crawlers.
>>5424
>What I was getting at is that they are garbage. Fucking digital entertainment is what they are.
I'm not sure if i understand, do mean the bluray quality or the content or the fact that they are digital? I'm lost.
>What you need is the actual old systems (hardware) and documentation for programmers, like Intel manuals, ARM manuals etc.
Sure all kind of technical manuals and documentations are covered by the literature section.
>There is no reason to trust your archive in particular. Only users who can confirm that your data is genuine would be able to make use of your archive.
Is see what you mean but there is not really a solution to that problem, unless official sources releases hashes.
>What I'm getting at is that you need a computer to do your work, not to be a library for everyone in the world.
You can do a lot of different stuff with a computer, from entertainment, to study, to research, to create stuff. But well to be entertained you need content, to study you need papers/documentation, to research you need data, to create you need other creations to take inspirations cause things are not created from nothing. So a library/archive is necessary after all IMO.
>Data gains value through the utility only. Archaeology may be a hobby, but not a necessity.
I disagree if things go south stuff that we take for granted today may be extremely useful for future generations.
>Enter some search pattern on a 1TB archive. The results will be probably hundreds of megabytes of matches.
I will organize it well, it will be tag based and i will try to keep it tidy(i have the order autism when it comes to fodlers and files).
>I just think it would be more effective to not just upload all the junk you see on the Interwebz.
Did you mean download instead of upload? Give me a list of valuable stuff to add and "junk" to avoid then.
>>5427
>The problem is that all think the same like you, which means you are probably true, but it's completely flawed.
Not sure if i understand your phrase composition, but if you're referring to the doomer thing, this is more a go getter approach if you get what i mean.

Nanonymous No.5462 [D][U][F] >>5463 >>5464
File: 622a7abc654c5dafeacdf8b3d327abe1cda9afe374cfd1e12aafd70f737b56f8.jpg (dl) (320.25 KiB)
>>5460
I don't want to derail your thread any longer. You have created the thread on /g/ asking for the advice on a technical matter, and not for the explanation why you're braindamaged. I'm leaving now.
>I'm lost.
Exactly. Your brain cannot fucking comprehend this, and you went on repeating yourself in your further paragraphs. This is going nowhere.

Nanonymous No.5463 [D]
>>5462
I'm glad you are back.

Nanonymous No.5464 [D]
>>5462
Well what a waste of time replying to you then. Bye.

Nanonymous No.5468 [D]
PROTECT THE ARCHIVE AT ALL COSTS AAAAAAAAAAAAAAAAAAAA

FUCKING HIRE SECURITY TEAM AND LAND MINE EVERYTHING IF THEY GET IN THE HARD DRIVES GO BYE BYE LIKE IT DID ON THE 39074v5t0wf8;l8gu8nktyirs


AAAAAAAAAAAAAAAAAAAAAAAAAAH

Nanonymous No.9717 [D] >>9720
OP here, after some delay cause of not enough money the project is finally starting, i still take requests about what to put in the archive.

Nanonymous No.9720 [D]
>>9717
Here are a few things that might be useful:
https://www.backblaze.com/blog/hard-drive-stats-q2-2019/
>They release these stats every year
https://www.servethehome.com/raid-reliability-failure-anthology-part-1-primer/
If you haven't already looked more into RAID. I can't seem to find it, but there's a article/paper written by a guy that shows the math of why you need to use certain parity levels when working with certain disk and array sizes.