Anon 03/12/2024 (Tue) 12:31 No.9820 del
>>9753
Since the text below in this post concerns itself with a similar phenomenon, I will explain the situation yet again.
>such interesting things [sarcasm] as how his scraper found that one source had a URL in lowercase and another source had the same URL with capital letters.
He is referring to that time that I explained how some of the data in https://poneb.in/ is invalid. Well, basically all of the data in poneb.in is invalid due to the website owner modifying newlines and deleting certain whitespace, but that's a different story (and I guess all of the source pastebin.com data/archives that poneb.in is based on is available). The malformed data that I pointed out was this: a likely small amount of paste IDs in poneb.in are invalid due to case sensitivity or insensitivity; I suspect that this has to do with a mechanism in web.archive.org >>9613. Two considerations here:
(1.) Caring about web data being a faithful copy of the source and pointing out when it isn't - this is related to "archive-quality" data purity and the dialectic of WARCs vs. raws.
(2.) Caring about what some random anon thinks is "interesting" or "annoying".

Obviously, I am going to value the 1st point over the 2nd one. So, on with what I was going to point out. Looks like iwiftp.yerf.org has a Windows-friendly-filenames copy of some/all ponibooru.org torrents. Examples:
>Ponibooru-All-Unrated/7177 - pinkie_pie %22original%22_character brundle rainbow_dash.jpg
>Ponibooru-All-Unrated/7177 - pinkie_pie "original"_character brundle rainbow_dash.jpg
the torrent only has the non-Windows-friendly-filename
>Ponibooru-All-Unrated/7643 - %22original%22_character ms_paint funny_to_me skywind weanus alpine_horn comic MetaSue.PNG
>Ponibooru-All-Unrated/7643 - "original"_character ms_paint funny_to_me skywind weanus alpine_horn comic MetaSue.PNG
and
>Ponibooru-All-Unrated/7641 - artist:wasd999 original_character.jpg
>Ponibooru-All-Unrated/7641 - artist_wasd999 original_character.jpg
related to >>9754

Message too long. Click here to view full text.