Round robin DNS, reverse web proxies,
some load balancing/failover products like
connection level firewalls hide a number of web servers behind a
A limitation of the technique is that only a single “front” web server will
Additionally with some of these products the operating system detected
is that of the “front” device rather than the web server behind.
There are a number of factors that create errors in this survey,
the main ones being:
Despite making multiple visits, there is still a low
probability that two computers will be considered the same by chance
similarities in low-level TCP/IP protocol header fields;
this leads to under-counting.
Some IP addresses do not respond on enough visits for the
technique to be applied.
This is mainly due to computers or networks being down or
badly overloaded on several of the visits,
in which case there are uncounted computers;
by extrapolating using the IP address/computer ratio for sites
running similar software we can roughly correct for this error.
If a system changes or upgrades operating system during the course of
the survey, which takes over a month to run,
and because of some other detailed issues,
a computer may be counted more than once;
this leads to over-counting, but generally this should be a fairly small effect.
It is difficult to determine a reasonable error bracket for the
computer count numbers, especially as the two major errors are in
opposite directions, so cancel to some extent.
One useful piece
of evidence that suggest there are not really large levels of error,
is that the average ratio of sites to computers on hosting company networks,
is over 10,
whereas the ratio of self hosted sites to computers is about 2.
Considering the technique in the abstract we think that
error margins world-wide are in the order of ± 10% on
IP addresses allocated to hosting companies, where the greatest number of
successful comparisons needs to be made by the technique,
and in the order of ± 5% on self hosted networks.
Note this is in addition to the limitation that we only identify at most
one computer per load-balanced website; we cannot quantify the numerical
effect of this limitation, but would expect only a minority of
web server computers world-wide to use load-balancers at this time,
so not causing large-scale distortion of the results.
Netcraft has been performing this survey since February 1999,
generally four times a year.
The trends since then have been very smooth suggesting there
is only a small amount of “random error” in this survey.
There could be significant “systematic error” affecting
particular groups of web servers more than others,
but there are no strong reasons to suppose this would affect particular
operating system groups or types of web server significantly more than others world-wide.
Studying the quarterly trend results in detail does give us
confidence that the error margins in the results are well within
the stated ± 10%.
Operating Systems used by Computers running public Internet Web Sites, March 2001