Tuesday, February 28, 2006

switched

Switched the hostonip-script to the new database - data should be dynamic again. DB under heavy load, lookups might still not be as fast as the should be.

tom
Feb 28 2006:

Known Hosts:8597068
Known Domains:3459664
Known IPs:9109832
Hostnames in 2do-Query:4194170

Sunday, February 26, 2006

Database moved

The Domains-Database moved to a faster, bigger server. Queries should be faster now, and we hope to broaden our range of known hosts and domains quite fast.
Queries on Serversniff will continue to run against the old (now static) database-host until we cleaned things up in the new one.

tom

Saturday, February 25, 2006

Webserver-Detection

Fixed various minor glitches in Serversniffs Webserver-Detection. We care about errors now and might link to more information about the various webservers and their modules. Check it out and mail us (or comment here) if we missed something. You are welcome to add text to the wiki!
Check it out:

http://www.serversniff.net/get_httpserver.php

tom

Friday, February 24, 2006

Hasshhhh....

Sshhhhh - don't tell anybody: we had nasty (and quite common) bugs in our hash-creator: while it worked well with "usual" strings, some hashes didn't work, especially those with a ', " or \. Since nobody complained, they might have gone completely unnoticed.
They are fixed now, only the ntlm-/lm-hashes weren't completely fixable: strings with both ' and " will give an empty hash and an error-message now. I'll try to fix this issue during the next week as well.
Mapping the Net crossed 3 Million Domains, but is running out of Disk-Space. The new server is here and will replace the old one by the end of March.

tom

Friday, February 17, 2006

Roger Schwarz, again

Soultcer pointed in a comment to http://lists.suse.com/archive/suse-linux/2001-Oct/4473.html - thats what you find at google, jup.

What I still asked myself is, who he really was. Is compiling the memory into erverstrings of many many webservers some kind of running gag? - Or is Rogers memory still present at T-Online? - I doubt that there are many people if anybody at all working for T-Online who used to know Roger. IT-Business and new economy used to be some kind of work fast, change often, don't really think about your past collegues.

tom

Sunday, February 12, 2006

In memoriam Roger Schwarz

Did Bugfixing, Adding and Removing....
  • Hardcorebugfixing: I disabled the DNS-Script until i fixed it up. This may take a bit of time.
  • Fixed the HTTP-Header-Script to implement Port-Numbers and to check hostnames with multiple IPs (try www.google.com)
  • Added a HTTP-Server-Detection (for those who think the Header-Script is tooo complex
  • Fixed a few Bugs on Texts and Links
  • Started customizing the english wiki - it't time to start doing this
  • Still bugs left to fix: HTTP-Header and Servertype don't support https.
Wondered again: who was Roger Schwarz? - His memory is linked into www.t-online.de's serverstring since many years. If anybody of the admins there still knows the old collegue?

Twiddled the Domain-Reaper-Script to prefer the new domains from the queue. We should break 3 million unique Domains soon.

Tuesday, February 07, 2006

the good, the bad and the ugly...

or the f*cking second-level-domains.

And no - don't tell me anything about ISO and standards - it seems, that some NICs set up standards for each and everything, while others just comply to "unwritten" standards.

Look at .pl with the "Standard-SLs"
agro.pl, aid.pl, atm.pl, auto.pl, biz.pl, com.pl, edu.pl, gmina.pl, gsm.pl, info.pl, mail.pl, media.pl, miasta.pl, mil.pl, net.pl, nom.pl, org.pl, pc.pl, priv.pl, realestate.pl, rel.pl, shop.pl, sklep.pl, sos.pl, targi.pl, tm.pl, tourism.pl, travel.pl, turystyka.pl

or .br with
adm.br, adv.br, am.br, arq.br, art.br, bio.br, cng.br, cnt.br, com.br, ecn.br, eng.br, esp.br, etc.br, eti.br, fm.br, fot.br, fst.br, g12.br, gov.br, ind.br, inf.br, jor.br, lel.br, med.br, mil.br, net.br, nom.br, ntr.br, odo.br, org.br, ppg.br, pro.br, psc.br, psi.br, rec.br, slg.br, tmp.br, tur.br, tv.br, vet.br, zlg.br

How many domains do you expect to find under Second-Level-Domains as fancy as "turystyka.pl" or "vet.br" ?? - Try a zonetransfer and see, if you get 100 Domainnames.

Others "just do it" and set up pseudo-SLDs like com.al.

Others just do it and set up pseudo-SLDs for every part of the country: stuff like all the italian or american "ro.it", "bz.it", "bs.it, "ut.us", "ws.us" and so on...

Others just register a fancy domain and sell subdomains like gb.net, gb.com, us.com, ru.com, eu.com, de.vu and so on.

Hey, NS-Admins, Hey ICANN:

this is UGLY!

- at least for me trying to get things sorted out in a manner to make queries simple and understandable for somebody who doesn't (want) to know about SLDs.

tom

Sunday, February 05, 2006

Cleaning the mess

Our queue is down to around 200.000 Hostnames and it seems that we can start filling it up again slowly in a few days. There are still huge zonetransfers every now and then. The net's huge.b

Implemented part of the data in the Subdomains-Script, for this is one of the most-used scripts on serversniff (i still can't imagine why).

tom

Friday, February 03, 2006

Twiddling the Database

I fixed the broken ICMP-Trace yesterday - it shows now a "we know XX Domainnames for this host" if you call the script and links to our "hostonip"-Script.
We seem to have passed the phase of the HUGE Zonetransfers finally - we import constantly new Domains but the queue of our "hostnames to import" stays roughly the same - it's hanging around 3.1 Million Hostnames for a few days now.
In the beginning, we transferred really huge zones, mostly .edu or other universities, also some second-level Domains like com.br and a few "zone-spammers": the stuffed literally millions of hostnames into their zone-files - regarding the content of the websites they seem to use this simply for spamming. But then, I don't know any search-engine that does Zone-Transfers.

Besides of the Spam-Domains we still have a few Million more exotic hostnames in a separate queue that we will serve if we are up to date with the "important" TLDs.

The queue is getting more and more the bottleneck of MappingTheNet - we have around 10 Tasks stuffing hosts in the queue on one side and 4 tasks of checking the hostnames and sorting them into the database. The "sort-in-tasks" take a long time to query the db for hosts to do, for they have to lock the database while they read and update the entry. I'm praying for a postgresql-guru, but i might end up with having to solve this problem by myself.

We are now aware of

4.913.386 unique Hostnames
5.302.751 unique IPs (that have at least 1 hostname assigned)
2.155.215 unique Domains
3.186.760 hostnames waiting to be sorted in from our queue

tom

Thursday, February 02, 2006

Adventures in diving through the web

We started a project called "Map the Net" recently: we created a crawlerscript thats mangling it's way through the global network and tries to get as many domains, ips and hostnames as it can get.

The results get stored - you guessed it, in a huge database and will make us very, very famous and rich somewhere in time.

Nobody but us has this information, let aside the major search-engines, the NSA, Dan Kaminsky and maybe the lovely guys at Netcraft.

tom