Crystal Pony 04/01/2024 (Mon) 20:40 No.10114 del
>>10111
>multiple people running scraping operations would probably be a huge drain on their resources
Such as: downloading many URLs from web.archive.org. I think it's less resource intensive to just download one multi-gigabyte file which has everything. WARC files are multigigabyte and "have everything" in some cases, but in other cases they have many "non-target" URLs in them. Hope that there's users/organizations/whatever other than me, and more interested than me, who would want to download hundreds of gigabytes from IA even if only a small percent of the WARC data contains the URLs of interest. And then they could provide public replays of them and try to archive them. Maybe such possible organization would be more interested if the data was more publicly available. I can think of some such similar independent "organizations": one related to the Yahoo! Answers purge (Quantserver or whatever it was called) and theponyarchive.com. (Also, maybe FAGMAN companies like Cloudflare, Google, Amazon, Microsoft, Apple.) IWIFTP guy said he wasn't really interested in downloading lots from IA/WBM without reason.

>>10113
>in case they are annoyed that users are creating WARCs of their WARC replays
So either engage in the "clown world" of making WARC files of WARC replays, or just download raws. Months ago, I used a program to download raws from WBM, not great because that would be missing stuff like headers which get replayed in an altered way: "x-archiv-*" or whatever.

>Why do they do this?
Not sure. Maybe they think what they already do is "enough" (I've had similar thoughts). It cost more to pay for hot storage than cold storage. But then, I don't know the details of IA's financial situation; they could be swimming in donation money for all I know.

Trixie-focused PMV
>/z9/youtube/Twig_I_UCBt86lSt-wCJuS-y5juoYWg/Pink_Floyd_s_The_wall_pony_edition_song_1-Twig_I-20110530-youtube-640x480-q-ZPJChIhO8.webm