Friday, February 03, 2006

Twiddling the Database

I fixed the broken ICMP-Trace yesterday - it shows now a "we know XX Domainnames for this host" if you call the script and links to our "hostonip"-Script.
We seem to have passed the phase of the HUGE Zonetransfers finally - we import constantly new Domains but the queue of our "hostnames to import" stays roughly the same - it's hanging around 3.1 Million Hostnames for a few days now.
In the beginning, we transferred really huge zones, mostly .edu or other universities, also some second-level Domains like com.br and a few "zone-spammers": the stuffed literally millions of hostnames into their zone-files - regarding the content of the websites they seem to use this simply for spamming. But then, I don't know any search-engine that does Zone-Transfers.

Besides of the Spam-Domains we still have a few Million more exotic hostnames in a separate queue that we will serve if we are up to date with the "important" TLDs.

The queue is getting more and more the bottleneck of MappingTheNet - we have around 10 Tasks stuffing hosts in the queue on one side and 4 tasks of checking the hostnames and sorting them into the database. The "sort-in-tasks" take a long time to query the db for hosts to do, for they have to lock the database while they read and update the entry. I'm praying for a postgresql-guru, but i might end up with having to solve this problem by myself.

We are now aware of

4.913.386 unique Hostnames
5.302.751 unique IPs (that have at least 1 hostname assigned)
2.155.215 unique Domains
3.186.760 hostnames waiting to be sorted in from our queue

tom

No comments: