Apache Guide: Logging, Part 5: Advanced Logging Techniques and Tips

Apache Guide: Logging, Part 5: Advanced Logging Techniques and Tips


September 25, 2000

In this final article on logging, I'll attempt to touch on a few of the things that I've left out or skimped on. By now you're probably tired of hearing about logging, so we'll start on something new next week.

I'll start with a few additional comments about log-file parsing. After stating that I was not at all trying to be comprehensive in my treatment of log-file-parsing software, and stating that I was aware of many other programs for this purpose, I received no less than 20 email messages from various users and software vendors either suggesting other packages, or chastising me for not mentioning their favorite application for this purpose.

There are dozens and dozens of software packages on the market for the purpose of parsing HTTP server log files and generating useful statistics. I talked about the few that I actuall have used and which I have found to be useful, and about one other that had been highly recommended to me recently. I was not trying to suggest that these were the only ones available, or even that they are the best.

A quick search on Google for "apache log reporting" or something like that, will return hundreds of pages dedicated to this topic, and various vendors selling their particular solution to this rather simple problem. These will do everything from give you a number ("You had 12 visits to your web site.") all the way up to drawing detailed graphs analyzing your traffic based on domain names and how that particular company is doing on the stock market. ("20% of your traffic was from Fortune 500 companies. See the blue bar on graph 27.")

I tend to prefer something closer to the simple end, since I'm usually just trying to get the big picture anyway.

Logging to a process

You don't have to log to a file. You can log to a process. This is particularly useful if you want your logs to go to a database or to some process that will give some type of real-time statistics on your web site traffic.

Now, I need to be perfectly honest about this. I have never had any particular use for this ability. I have played with it from time to time, but have never found any actual practical use for it. Perhaps someone can tell me about some real-life situation where this has been of value.

Anyways, here's what you can do. Using either the TransferLog or CustomLog directive, you can, instead of specifying a file to which the log should be written, you can specify "|", followed by the name of a program that is to receive the logging information.

For example:


     CustomLog |/usr/bin/apachelog.pl common

where /usr/bin/apachelog.pl is some program that knows what to do with Apache log file entries. This may be as simple as a Perl program that processes the log entries in some fashion, or it may be something that writes entries to a database.

The main thing to be cautious about if you're going to do this is security. Log files are opened with the permissions of the user that starts the server. This is usually root. And this applies as well to logging to a process. Make sure that the process to which you are logging is secure. If you log to an insecure process (one that some non-root user can tinker with) you run the risk of having that process be replaced by another that does unsavory things. If, for example, /usr/bin/apachelog.pl is world-writable, any user could edit it to shut down your server, mail someone the password file, or delete important files. This would be done with root permissions.

If you want to log to a process of some kind, you might be better advised to look for a module that already implements the functionality that you are looking for. Check out http://modules.apache.org/ for a list of some of the modules available to do all sorts of cool things with Apache.

Rotating Your Log Files

Log files get big. If you're not careful, and if you're logging to somewhere like /var, you can actually fill up the partition and bring your server to a grinding halt. Yes, I've done this.

The way around this is to move your log files to some other place before they get too big. This can be accomplished a number of different ways. Some Unix variants come with a logrotate script that handles this for you. RedHat, for example, comes preconfigured to rotate your logs for you every few days, based on either their size or their age.

If you want to do this yourself, you can use a Perl module (freely available from CPAN) called Logfile::Rotate. The following code, run periodically (perhaps once a week?) by cron, will rotate out your logfile, keeping five previous log files at any given time. Each backup log file will be gzipped to conserve space.


     use Logfile::Rotate;
      = new Logfile::Rotate(
          File => '/usr/local/apache/logs/access_log',
          Count => 5,
          Gzip => '/bin/gzip',
          Signal => sub {
               '/usr/local/apache/bin/apachectl restart';
               }
          );

This does not seem like much. The Perl module takes care of all the details. You'll end up with files called things like access_log.1.gz, access_log.2.gz, and so on. Each file will get bumped up one number each time, and the file that used to be access_log.5.gz will be deleted each time.

This keeps you from running out of space on your log drive, and keeps as much of an archive as you like.

Logging for Multiple Virtual Hosts

I had several people write to me asking about how to handle logging when you have more than one virtual host on the same machine. I assume that they are running all of their logs into one log file, and are then attempting to split that log file back out into its component parts in order to get meaningful reports per host.

The solution to this problem is not to log to one log file in the first place. I know that there are utilities out there that will take a mixed log file, and, based on your virtual host configurations, figure out what requests were for which virtual host, and generate reports appropriately. This all seems to be too much work, as far as I can tell.

In each of your VirtualHost sections, simply specify a log file for that host. You can then handle each log file separately when it comes time to run reports.

There are some concerns with available file handles. That is, if you are running hundreds of virtual hosts, and have a log file per host, you may encounter a situation where you run out of available file handles. This can cause system instability and can even cause your system to halt. However, this is primarily a concern on servers that are hosting a very large number of virtual hosts.

For those that asked this question, please let me know if I'm completely missing the point of your question.

Summary

In the last several weeks, we've talked about various aspects of logging with Apache. You should now be equipped to log whatever information that you're interested in, and get all sorts of useful statistics out of those log files.

If there are other topics that you'd like to see me cover in Apache Guide, please send me a note at ApacheToday@rcbowen.com