/g/ - Technology

install openbsd

[Make a Post]
[X]





Easy way to archive Nanochan? Nanonymous No.1612 [D][S][L][A][C] >>1615 >>7360

Obviously we won't be able to go on archive.is to archive posts here. Is some kind of archiving functionality planned? Ideally I'd love to have for example a button in the top of the page that downloads a gzipped folder containing the html of the thread, the css and the files uploaded. I've never programmed an image board before so I don't know how hard it is to implement but there should be a way to easily archive things here.

Nanonymous No.1613 [D]

Use wget? Can't be that hard.

hakase ## Nanochan Administrator No.1615 [D] >>1616 >>7329

>>1612
It's not particularly hard:
1. download everything from the /Static/ directory: nanochan.css, audio.png, video.png, document.png
2. download catalogs of all the boards
3. use a regular expression to find all thread URLs
4. download those thread URLs
5. use a regular expression to find all images in the /Media/ directory
6. download those images
7. set up a http server to point at the directory you downloaded
Obviously you have to preserve the directory structure for anything to work properly.
Alternatively, you can use the pyshit webpage2html script which can be found somewhere on shithub.

Nanonymous No.1616 [D] >>1617 >>7329

>>1615
That is a possibility ... :^)

I would just use Wget, though I dislike it myself because it still doesn't have multi-threading and can thus be incredibly slow ...
[code]
torsocks -P 9050 wget --adjust-extension --convert-links --no-parent --page-requisites --mirror [URL]
[/code]

Nanonymous No.1617 [D] >>1618

>>1616
Wget is GNU cancer. Just follow Hapase's suggestion.

Nanonymous No.1618 [D]

>>1617
It's shitty, I know.

Nanonymous No.7329 [D] >>7357
>>1615
your solution does not tell how to update the archive without redownloading everything
it requires starting http server to browser the archive, which is bloat and unnecessary

>>1616
speed is not needed for this task
but there are alternatives to wget

Nanonymous No.7357 [D] >>7382
>>7329
The total non-Media file size of everything on nanochan was around 16mb when I was still archiving the site, which is about nothing, so I just had my script exclude /Media and mirrored the site weekly that way

Nanonymous No.7360 [D]
>>1612
You can also use that python script from /ar/

Nanonymous No.7382 [D]
>>7357
you need at least thumbs so you can easily identify thread in catalog, or see what kind of image someone posted in post

Nanonymous No.7390 [D]
FLOOOGABABOOGERBLABAFLOIOBFLOOOOOOJFWFIWE OOOOOOOOOOOOOOOOOOOOOOOOOOO


THAT NEEDS TO BE ARCHIVED BROS

HAHAHAHAH!!!!

Nanonymous No.7401 [D] >>7405 >>7408
http://nanochanqwrwtmamtnhkfwbbcducc4i62ciss4byo6f3an5qdkhjngid.onion.ly/
just archive.is a onion gateway?

Nanonymous No.7405 [D]
>>7401
>it actually works
LMFAO fagmin is so incompetent. Under hapase all the onion gateways were banned.

Nanonymous No.7408 [D]
>>7401
Looks like onion.ly doesn't have the same tor2web headers, but it has many distinct headers that can be used to detect requests from that specific site. Should be blocked now, tell me if it isn't