The June Netcraft Results are Out: Apache Gains Slightly in Market Share Page 2
Round robin DNS, reverse web proxies, some load balancing/failover products like Cisco LocalDirector and BIG-IP and some connection level firewalls hide a number of web servers behind a hostname. A limitation of the technique is that only a single "front" web server will be counted. Additionally with some of these products the operating system detected is that of the "front" device rather than the web server behind.
Error MarginsThere are a number of factors that create errors in this survey, the main ones being:
- Despite making multiple visits, there is still a low probability that two computers will be considered the same by chance similarities in low-level TCP/IP protocol header fields; this leads to under-counting.
- Some IP addresses do not respond on enough visits for the technique to be applied. This is mainly due to computers or networks being down or badly overloaded on several of the visits, in which case there are uncounted computers; by extrapolating using the IP address/computer ratio for sites running similar software we can roughly correct for this error.
- If a system changes or upgrades operating system during the course of the survey, which takes over a month to run, and because of some other detailed issues, a computer may be counted more than once; this leads to over-counting, but generally this should be a fairly small effect.
It is difficult to determine a reasonable error bracket for the computer count numbers, especially as the two major errors are in opposite directions, so cancel to some extent. One useful piece of evidence that suggest there are not really large levels of error, is that the average ratio of sites to computers on hosting company networks, is over 10, whereas the ratio of self hosted sites to computers is about 2.
Considering the technique in the abstract we think that error margins world-wide are in the order of ± 10% on IP addresses allocated to hosting companies, where the greatest number of successful comparisons needs to be made by the technique, and in the order of ± 5% on self hosted networks. Note this is in addition to the limitation that we only identify at most one computer per load-balanced website; we cannot quantify the numerical effect of this limitation, but would expect only a minority of web server computers world-wide to use load-balancers at this time, so not causing large-scale distortion of the results.
Netcraft has been performing this survey since February 1999, generally four times a year. The trends since then have been very smooth suggesting there is only a small amount of "random error" in this survey. There could be significant "systematic error" affecting particular groups of web servers more than others, but there are no strong reasons to suppose this would affect particular operating system groups or types of web server significantly more than others world-wide. Studying the quarterly trend results in detail does give us confidence that the error margins in the results are well within the stated ± 10%.