>>10702 >Uploading "WARC: booru.org - 292 subdomains" so it can be accessible at host-based addresses [no in "hostless" yet] Here's that set of sites at a non-host-based address:
https://gateway.pinata.cloud/ipfs/bafybeifldwwyxjwhgom5ehqtekqdnoe566hq2snpjdzgabygmafq4uffea . Size: 112 GB
. Doesn't have archive.org's derived files: that's what "$ ipfs pin update ..." is for
. Won't delete write-able folder "/zc/warc/za-warc/" yet - I could use "fancy IA downloader" to get derived files then update that pin (so I can have an upstream and downstream folder where the downstream one is larger because it has extra data)
>>10663 >>10704 >Record rsync info to "root/metadata/donelist.txt" (As of 2024-07-20 UTC, HDD za was a direct copy of HDD z8 plus extra data without anything changed: this is recorded thanks to donelist.txt!) It's also important to record errored files to "root/metadata/errlist.txt". I was recovering data with the help of errlist.txt:
https://gateway.pinata.cloud/ipfs/bafybeidm7khqusjtzh6z37alrytbty7qox47p5kkiqmsrqma3dq54m5gb4 . Size (no dedup): 14 GB.
. There's a (zfs+)rsync error which completely skips a file if it is corrupted (some of the storage medium where the data should be is bad). Fix for "rsync: [sender] read errors mapping "$path": No data available (61)" = use "$ cat $path > file.bin" then "$ tac $path | tac > file.bin.tac.tac" then "$ cat file.bin file.bin.tac.tac > file.bin.tac.tac.cat". The 3rd file, "file.bin.tac.tac.cat" (which is top bytes + skip hopefully just one part + bottom bytes), will be missing some data, but it's better than not having that file at all.