GuidesApache Guide: Logging with Apache--Understanding Your access_log

Apache Guide: Logging with Apache–Understanding Your access_log




Apache comes with built-in mechanisms for logging activity on
your server. In this series of articles, I’ll talk about
the standard way that Apache writes log files, and some of
the tricks for getting more useful information and statistics
out of your server.

Apache keeps extensive track of your server usage via logfiles. In this article, Rich Bowen discusses logfiles and how you can get more useful information from them.

This week we’ll talk about the information that appears in
your transfer log, and what it all means.

The standard log files

If you have done a default installation of Apache, when you
run your server, two log files will get written. These files
are called access_log (access.log on Windows) and
error_log (error.log on Windows). These files can be
found (again, if you did a default installation) in
/usr/local/apache/logs. On Windows, the logs will be in
the logs subdirectory of wherever you installed Apache.
Various of the package managers put the log files in various
other places, and you’ll have to poke around to find them,
or check in the configuration file for the configured location.

access_log

access_log is, as the name suggests, the log of all
accesses to your server. Typical entries in this file look like:

        216.35.116.91 - - [19/Aug/2000:14:47:37 -0400] "GET / HTTP/1.0" 200 654

This line contains 7 pieces of information. Actually, two of them
are blank in this example, but there is space for 7 pieces of
information.

The first piece of information is the address of the remote host.
That is, who is looking at your web site. In the example above,
the host visiting my web site is 216.35.116.91, which is,
incidentally, the IP address of the machine called
si3001.inktomi.com. (I figured that out by looking up the
address in DNS, with the nslookup utility.) inktomi.com is
a company that makes web searching software. (I looked at their
web site.) Since this same IP address requested the file
robots.txt just a few seconds earlier, I suspect that this
is a web searching spider that was indexing my web site. I’ll
talk about spiders in another column. So, just based on that
first piece of information, and a glance back in the log file,
I’ve already found out quite a bit of information about my visitors.

By default, this address is just the IP address of the remote
host. You can tell Apache to look up all the host names, and
put those host names in the log instead of the IP address. This is
probably not a good idea, since it greatly slows down the logging process,
and so slows down your entire server. And there are various tools
that will go through your log after the fact, and resolve all the IP
addresses to host names, so there’s no real advantage to doing this
anyway.

But, if you want to, you can tell Apache to do these lookups with
the directive:

        HostNameLookups on

Setting HostNameLookups to double, rather than on, will cause
the logging process to do a reverse lookup on the name that it finds,
to verify that it points back to the IP address that you started with.
The value is set to off by default.

Latest Posts

Related Stories