English | Deutsch | Español | Português
 Benutzerkennung:
 Passwort:
neuer Benutzer
 Survey Data Mining: Home | Order/Renew | SQL DB Documentation                                            Reports: Latest | Archive | FAQ | Glossary
 Queries: Reverse IP Spy | Name Server Spy | Mail Server Spy | Web Bug Spy | Area Code Spy | Zip Code Spy | Browse/Download Datasets
Browse: select * from where  =  '' or
 

Web Sites

1.1 Description

Forming the basis of many collectivities, this table contains the list of all web sites supporting the HTTP protocol. The complete record set is update each month with up to date ip addresses and server types.

This is our oldest continuously updated data set, with the original data collected in April, 1998, and updated every month since then. Our free survey report archive features web server survey reports dating back to June 1st, 1998 based on this data.

1.2 Schema

NameTypeDescription
hostnamevarchar(80)The hostname of the web site as originally found by our web spiders.
domainvarchar(80)The domain name of the site, derived from the hostname. This may be the same as the hostname.
portintThe port number on which the web site's HTTP service resides. A value of -1 indicates the default of 80
ipaddrvarchar(15)The IP address of the web server. If multiple IP addresses are returned when resolving the hostname, this will be the first IP address.
servertypevarchar(255)The server signature string, as determined via the "Server:" HTTP header line.
jtimevarchar(13)The julian timestamp. For historical reasons, represented in milliseconds, but the last 3 digits are currently always set 0.

1.3 Unique Keys

hostname + port

1.4 Additional Keys

domain
port
ipaddr
servertype

1.5 Data Collection Methodology

Data collection is broken down into two distinct phases. Phase 1 consists of IP address resolution of all known hosts. Phase 2 consists of contacting the remote servers in order to collect server signature strings. This two phase process is done because in many cases hostnames will share a web server on a common IP address. Phase 1 allows us to identify all distinct IP addresses which can subsequently be contacted once and only once for the server signature.

When the web server is contacted, a "GET /robots.txt" HTTP 1.0 protocol request is issued.

Each web site in the database is processed in the above fashion each month.

1.6 Known Limitations

The two phase data collection approach means that there is a lag of up to about one week between the IP address resolution phase and the signature collection phase. If a web site changes its IP address during this interval, there can be mismatch of the IP address resolved in phase 1, with the actual IP address of the site when the signature string is retrieved from the server as reported by jtime.

1.7 Other Notes

Field ipaddr can have the special value of NO_RESOLVE implying no name resolution could be done

Field servertype can have the special values of NO_RESOLVE implying no name resolution could be done, or NO_CONTACT if a server response could not be obtained when issuing an HTTP GET request.

At the start of each month, a fresh table with nil data values is created. At that point in time, new unique hosts not already in the table based (newhosts) are added to the table, and web sites that were non-responsive for the previous 3 months (expired) are removed.



Startseite | Über uns | Kontakt | Partner Programme | Datenschutz | Mailinglisten | Übergriffe
Sicherheits Überprüfungen | Verwaltete DNS | Netzwerk Überwachung | Webseiten Analysator | Internet Recherche Berichte
Web Sonde | Whois

© 1998-2009 E-Soft Inc. Alle Rechte vorbehalten.