Apache Guide: Logging, Part 4 -- Log-File Analysis Page 2

HTTP is a stateless, anonymous protocol. This is by design, and is not, at least in my opinion, a shortcoming of the protocol. If you want to know more about your visitors, you have to be polite, and actually ask them. And be prepared to not get reliable answers. This is amazingly frustrating for marketing types. They want to know the average income, number of kids, and hair color, of their target demographic. Or something like that. And they don't like to be told that that information is not available in the log files. However, it is quite beyond your control to get this information out of the log files. Explain to them that HTTP is anonymous.

And even what the log files do tell you is occasionally suspect. For example, I have numerous entries in my log files indicating that a machine called cache-mtc-am05.proxy.aol.com visited my web site today. I can tell that this is a machine that is on the AOL network. But because of the way that AOL works, this might be one person visiting my site many times, or it might be many people visiting my site one time each. AOL does something called proxying, and you can see from the machine address that it is a proxy server. A proxy server is one that one or more people sit behind. They type an address into their browser. It makes that request to the proxy server. The proxy server gets the page (generating the log file entry on my web site). It then passes that page back to the requesting machine. This means that I never see the request from the originating machine, but only the request from the proxy.

Another implication of this is that if, 10 minutes later, someone else sitting behind that same proxy requests the same page, they don't generate a log file entry at all. They type in the address, and that request goes to the proxy server. The proxy sees the request and thinks "I already have that document in memory. There's no point asking the web site for it again." And so instead of asking my web site for the page, it gives the copy that it already has to the client. So, not only is the address field suspect, but the number of request is also suspect.

So, Um, What Good are These Logs?

It might sound like the data that you receive is so suspect as to be useless. This is in fact not the case. It should just be taken with a grain of salt. The number of hits that your site receives is almost certainly not really the number of visitors that came to your site. But it's a good indication. And it still gives you some useful information. Just don't rely on it for exact numbers.

How Do I Get Useful Statistics?

So, to the real meat of all of this. How do you actually generate statistics from your Web-server logs?

There are two main approaches that you can take here. You can either do it yourself, or you can get one of the existing applications that is available to do it for you.

Unless you have custom log files that don't look anything like the Common log format, you should probably get one of the available apps out there. There are some excellent commercial products, and some really good free ones, so you just need to decide what features you are looking for.

So, without further ado, here's some of the great apps out there that can help you with this task.


This article was originally published on Sep 18, 2000
Page 2 of 4

Thanks for your registration, follow us on our social networks to keep up-to-date