Apache Guide: Logging with Apache--Understanding Your access_log
Apache comes with built-in mechanisms for logging activity on your server. In this series of articles, I'll talk about the standard way that Apache writes log files, and some of the tricks for getting more useful information and statistics out of your server.Apache keeps extensive track of your server usage via logfiles. In this article, Rich Bowen discusses logfiles and how you can get more useful information from them.
This week we'll talk about the information that appears in your transfer log, and what it all means.
If you have done a default installation of Apache, when you
run your server, two log files will get written. These files
access.log on Windows) and
error.log on Windows). These files can be
found (again, if you did a default installation) in
/usr/local/apache/logs. On Windows, the logs will be in
logs subdirectory of wherever you installed Apache.
Various of the package managers put the log files in various
other places, and you'll have to poke around to find them,
or check in the configuration file for the configured location.
access_log is, as the name suggests, the log of all
accesses to your server. Typical entries in this file look like:
18.104.22.168 - - [19/Aug/2000:14:47:37 -0400] "GET / HTTP/1.0" 200 654
This line contains 7 pieces of information. Actually, two of them are blank in this example, but there is space for 7 pieces of information.
The first piece of information is the address of the remote host.
That is, who is looking at your web site. In the example above,
the host visiting my web site is
22.214.171.124, which is,
incidentally, the IP address of the machine called
si3001.inktomi.com. (I figured that out by looking up the
address in DNS, with the
a company that makes web searching software. (I looked at their
web site.) Since this same IP address requested the file
robots.txt just a few seconds earlier, I suspect that this
is a web searching spider that was indexing my web site. I'll
talk about spiders in another column. So, just based on that
first piece of information, and a glance back in the log file,
I've already found out quite a bit of information about my visitors.
By default, this address is just the IP address of the remote host. You can tell Apache to look up all the host names, and put those host names in the log instead of the IP address. This is probably not a good idea, since it greatly slows down the logging process, and so slows down your entire server. And there are various tools that will go through your log after the fact, and resolve all the IP addresses to host names, so there's no real advantage to doing this anyway.
But, if you want to, you can tell Apache to do these lookups with the directive:
double, rather than
on, will cause the logging process to do a reverse lookup on the name that it finds, to verify that it points back to the IP address that you started with. The value is set to