Anon 02/16/2024 (Fri) 00:12 No.9587 del
>>9575
>Of any site that could handle large scale web crawling it'd be a text based thing like tv scripts.
True. Pretty much no pictures or videos there either, just a total of like 70,000 human-readable plaintext webpages. More crap I saw on that site:
>img alt=Donate with PayPal
>Donate
>Don't be a leech! Please either enable ads on this site or contribute towards the running costs. Thanks!
>
>Read more at: http://bafybeiaywfleofec2jy6y6agsjplyqaq4qzvtbm7pyvvyfewptjyx4kp5e.ipfs.localhost:48084/https---tvshowtranscripts.ourboard.org-viewtopic.php-t-67046.htm [The "Read more" part is text which JS inject into the end of the clipboard.]
Fuck does he think this is? BitTorrent? Some FTP server from the 2000s? "Noooo! only I can have exclusive 100% control of 100% of my website! And you can only access it at a rate of 4 webpages per day!" He seems to dislike decentralization/archiving a lot, and on mobile I saw an ad in the webpage like every 3rd screenful. Plus the Google Canvas ad (I think that's what it's called) which pops up, covers the entire screen, and you have to press the close button to remove it. ("Fuck" and "shit" are also somewhat censored on each website.) Looking at a less-open system really shows how good more open or accessible systems can be. That site also blocks vanilla curl: "curl -sL $url"
><h1>Forbidden</h1><p>Access denied!</p><p>There are several reasons why you may be seeing this:</p><p>If you are a new visitor, I apologize:</p><p>If you are using a proxy or VPN, try turning it off. There is no login required on this site therefore it is unnecessary. A previous abuser probably also used the same proxy or VPN.</p><p>If you are a previous visitor, who may have contravened our terms of use, been hammering our server, or otherwise done something we do not like, your activity has been recorded and, after investigation, may be reported to your service provider, as well as any other concerned or affected parties.</p><p>In general, all bans are permanent. Breach of our terms of use leave you open to legal action.[...]

>>9573
>make "lynx -source" downloader+other-stuff script save a timestamp file
Done, now writes a file specifying webpage-download-complete (Unix timestamp):
https://cloudflare-ipfs.com/ipfs/QmQ13LDFhHkxz41dUKZNFoP8yQEkBsbGpgCQCzEyPB2ckx

>>9586

Message too long. Click here to view full text.