|
1.1 DescriptionForming the basis of many collectivities, this table contains the list of all web sites supporting the HTTP protocol. The complete record set is update each month with up to date ip addresses and server types. This is our oldest continuously updated data set, with the original data collected in April, 1998, and updated every month since then. Our free survey report archive features web server survey reports dating back to June 1st, 1998 based on this data. 1.2 Schema
1.3 Unique Keyshostname + port 1.4 Additional Keysdomain port ipaddr servertype 1.5 Data Collection MethodologyData collection is broken down into two distinct phases. Phase 1 consists of IP address resolution of all known hosts. Phase 2 consists of contacting the remote servers in order to collect server signature strings. This two phase process is done because in many cases hostnames will share a web server on a common IP address. Phase 1 allows us to identify all distinct IP addresses which can subsequently be contacted once and only once for the server signature. When the web server is contacted, a "GET /robots.txt" HTTP 1.0 protocol request is issued. Each web site in the database is processed in the above fashion each month. 1.6 Known Limitations
The two phase data collection approach means that there is a lag of up to
about one week between the IP address resolution phase and the signature
collection phase. If a web site changes its IP address during this interval,
there can be mismatch of the IP address resolved in phase 1, with the actual
IP address of the site when the signature string is retrieved from the
server as reported by jtime.
1.7 Other NotesField ipaddr can have the special value of NO_RESOLVE implying no name resolution could be done Field servertype can have the special values of NO_RESOLVE implying no name resolution could be done, or NO_CONTACT if a server response could not be obtained when issuing an HTTP GET request. At the start of each month, a fresh table with nil data values is created. At that point in time, new unique hosts not already in the table based (newhosts) are added to the table, and web sites that were non-responsive for the previous 3 months (expired) are removed.
Security Audits | Managed DNS | Network Monitoring | Site Analyzer | Internet Research Reports Web Probe | Whois
© 1998-2008 E-Soft Inc. All rights reserved. | ||||||||||||||||