/g/ - Technology

install openbsd

[Make a Post]
[X]





What would be the best web crawler algorithm for tor? Nanonymous No.6838 [D][S][L][A][C] >>6841 >>6855
I thought about just bruteforcing every .onion address but I realised that was a bad idea after looking up the number of all possible v2 addresses. I don't think following links will get you really far on tor because almost no sites (that I've been on) have links to external sites besides link dumps.

Nanonymous No.6840 [D]
Scrape the normalfagweb for onion links, check if they're valid. Dump every onion search engine.
Also scrape hidden services in the odd case they do link to others anyway.

Nanonymous No.6841 [D] >>6842
>>6838
All the currently hosted hidden services are stored in a DHT somewhere. The tor project probably has documentation on how to scrape it, so start there.

Nanonymous No.6842 [D]
>>6841
Also use the stupid questions thread next time.

Nanonymous No.6855 [D]
>>6838
v2 .onion can be harvested by just being a relay with the HSDir flag, unless you use HiddenServiceAuthorizeClient on stealth mode. It's even possible to DoS an array because which relay knows about a particular .onion can be predicted.
v3 .onion can't be harvested that way, their descriptors are encrypted.

Nanonymous No.6856 [D]
Does anyone know any relay operators that I can possibly email in order to get the hash table?

Nanonymous No.6877 [D] >>6885
>I thought about just bruteforcing every .onion address but I realised that was a bad idea after looking up the number of all possible v2 addresses.
that wouldn't even work for clearnet either. you'll still have to crawl to find more stuff with longer domain names.

Nanonymous No.6878 [D] >>6882
>>6863
nice cp links faggot

Nanonymous No.6882 [D]
>>6878
>nice cp links faggot
OP is crawling every .onion, that has to include child pornography communities

Nanonymous No.6885 [D] >>6890
>>6877
Just bruteforce IPv4 ips.

Nanonymous No.6890 [D] >>6891 >>6920
>>6885
Most HTTP servers requre a correct Host: header.

Nanonymous No.6891 [D] >>6919
>>6890 (wowee only 79 posts until the big haha sex digits)
There's a cool thing called reverse DNS.

Nanonymous No.6919 [D]
>>6891
Only sometimes works.

Nanonymous No.6920 [D] >>6922 >>7209
>>6890
Well, they will have 80 and 443 ports open, that would be the bruteforce target.

Nanonymous No.6921 [D]
Anyway, some cool (old) shit for you guys.

https://archive.org/details/Carna_Internet_Census

https://web.archive.org/web/20140903052844/http://internetcensus2012.bitbucket.org/paper.html

Nanonymous No.6922 [D]
>>6920
Though yeah, Tor sites don't have to respond to random shit.

Nanonymous No.7024 [D]
Just use Freshonions unless you're doing it for fun/challenge.

Nanonymous No.7209 [D]
>>6920
that doesnt help at all. you dont understand the problem or you dont understand the subthread you're replying to