|
|
| Survey History |
 |
The survey was started back in May of 1998. It started as a simple
web server survey and SSL server survey, and has over the years grown
to include other technology surveys (e.g. mail servers, real time
streaming protocol servers).
Raw data is collected each month and added to the database of
information we have collected over the years, which is then used
to produce the monthly and querying capabilities now available on-line.
| Survey Methodology |
 |
1.1 Guiding Principles
Early on, several key design decisions were made regarding our surveys.
- Attempt to avoid bias whenever possible.
- Be consistent.
- Only use information that is free from any restrictions on re-use.
Some examples of techniques that we explicitly avoid because they
break one of the above principles include:
- Never perform domain zone transfers to discover additional hosts.
Zone transfers are unreliable. Many name servers block zone transfers, and
as web site security has become more important, fewer and fewer zones are
accessible. Because the ability to do zone transfers may be driven by
security concerns, performing transfers would introduce a bias on the
hostname discovery process, and is thus rejected.
- Never guess host names. The top 100 host names account for 65% of all
known hostnames in use. So if we find a domain "yourdomain.com", the odds
are really good that one of several hostnames are in use in the domain, such
as "www", or "secure", or "mail". That makes it a particular tempting
target for discovery of hosts. However, doing so overlays a bias
(the selection of a precanned set of hostnames)
onto an existing data set, and as such, is rejected.
- Never use top level domain name server data.1
The problem with top level domain data is that we cannot get access to
ALL top level domains, and as such, the acquisition of TLD data introduces
a bias based on which TLDs we include. We already have seen in our data sets
that certain countries demonstrate clear trends towards certain technologies.
In order to fairly represent market share, the same collection methodology
must be used across all countries, languages, etc.
And if that wasn't enough, TLD data can come with licensing restrictions that
would taint our entire data set should we have chosen to use it. So even
if we could get access to all TLD data, it is unlikely we would choose
to use it because it would restrict what we could do with the data we
have collected.
1It is highly suspected that another popular
set of surveys uses TLD data to acquire host names, based on the fact that
a number of domains purchased by us have been queried by said surveying
organization before the domains were ever publicly announced.
The result is that we have chosen a methodology that results in a host
being included in our survey if and only if we find the host referenced
by someone else on the web through a link of one form another.
1.2 Crawlers and Polling
We have two distinct types of data collection activities that operate on
an ongoing basis.
- Crawlers operate non-stop and operate much as any web crawler
does, spidering a web site. Crawlers are responsible for finding links
to new web sites and services, making note of the types of imbedded
content found in web pages (e.g. frames, applets, image types), and
recording other interesting attributes of pages and the headers (cookies,
privacy policies, etc.). Approximately 10% of the known web sites are
crawled each month.
- Polling processes are scheduled monthly and responsible for
updating data each month an entire data set. For example, every IP
known to host a web server will be visited to determine the type of
web server operating on it. Every domain will have several DNS queries
issued against it, resolving IP addresses of known hosts, determining
the location of name servers and mail servers, querying for any TXT
records (e.g. for SPF usage).
All data collected for our surveys is exclusively collected from
one of the above activity types.
Principal |
Acerca de Nosotros |
Contáctenos |
Programas de Asociado |
Privacidad |
Listas de Correo |
Abuso
Auditorías de Seguridad |
DNS Administrado |
Monitoreo de Red |
Analizador de Sitio |
Informes de Investigación de Internet
Prueba de Web |
Whois
© 1998-2008 E-Soft Inc. Todos los derechos reservados.
| |