Apache Guide: Logging, Part 4 -- Log-File Analysis
In the first sections of this series, I've talked about what goes into the standard log files, and how you can change the contents of those files.The problem with log files is that they track an enormous amount of information -- not all of it much good to the people that pay your salary.
This week, we're looking at how to get meaningful information back out of those log files.
The problem is that although there is an enormous amount of information in the log files, it's not much good to the people that pay your salary. They want to know how many people visited your site, what they looked at, how long they stayed, and where they found out about your site. All of that information is (or might be) in your log files.
They also want to know the names, addresses, and shoe sizes of those people, and, hopefully, their credit card numbers. That information is not in there, and you need to know how to explain to your employer that not only is it not in there, but the only way to get this information is to explicitly ask your visitors for this information, and be willing to be told 'no.'
There is a lot of information available to put in your log files, including the following:
of the remote machine
is almost the same as "who is visiting my web
site," but not quite. More specifically, it tells
you where that visitor is from. This will be
- Time of
- When did this person
come to my web site? This can tell you something
about your visitors. If most of your visits come
between the hours of 9 a.m. and 4 p.m., then
you're probably getting visits from people at
work. If it's mostly 7 p.m. through midnight,
people are looking at your site from home.
Single records, of course, give you very little useful information, but across several thousand 'hits', you can start to gather useful statistics.
- What parts of
your site are most popular? Those are the parts
that you should expand. Which parts of the site
are completely neglected? Perhaps those parts of
the site are just really hard to get to. Or,
perhaps they are genuinely uninteresting, in
which case you should spice them up a little. Of
course, some parts of your site, such as your
legal statements, are boring and there's nothing
you can do about it, but they need to stay on the
site for the two or three people that want to see
- And, of course,
your logs tell you when things are not working as
they should be. Do you have broken links? Do
other sites have links to your site that are not
correct? Are some of your CGI programs
malfunctioning? Is a robot overwhelming your site
with thousands of requests per second? (Yes, this
has happened to me. In fact, it's the reason that
I did not get this article in on time last week!)