Apache Maintenance Basics
May 13, 2004
You've downloaded and configured your Apache server and are ready to move on to the next project. Can it really be left to fend for itself in a darkened room?
Yes. To some degree, anyway. With the exception of configuration testing, once Apache is up, you likely need never think about how the Web server is running.
On the other hand, completely ignoring your Apache installation would be foolhardy.
Doing some regular checks and maintenance on your Apache installation helps identify any issues usually before they even become issues and helps you stay up date with the latest security and performance patches. This article covers some of the major steps and maintenance tasks that should be regularly undertaken while the Apache system is running.
The first step of regular Apache maintenance is to keep a close eye on what Apache is doing. Monitoring the logs really only tells you about the status of the Web serving not the status of the Apache server itself at a moment in time. For live monitoring, use mod_status, which provides a summary of the active processes and threads and their current activity.
The following screenshot is an example of a mod_status report on an intranet server.
What you get is a heap of information about the active processes and their current status, what they are doing, and how busy they have been. Just a getting a response is a good sign that the server is running; the information from mod_status more detailed information. To enable mod_status, add, or uncomment, the following lines on you server config:
The Allow line must include the hosts, domains, or IP addresses for whom you want to provide access to the information.
Also, although the display does not need to be open continuously, if you suspect something is wrong, it is a good starting point.
Checking logs is the best way to find out what is going on. Apache 2.0 introduced the generation of a separate error log for the Apache process itself. Checking this, often, is the best way to find out if something needs attention, as examining the logs makes it easy to catch a faulty or missing module or a bad process. Consider this sample fragment:
All log entries are marked with a particular class in much the same way as entries in the system log are marked under the various Unix variants. Log levels include 'notice', which is for notification information only; 'info', which relates to running or log information; 'debug', which is output when debugging on a module is enabled; and 'warn', which notifies of a series problem.
Checking Web host logs should also be a regular activity. These highlight problems with missing files, errors in CGI scripts, and users trying to access files and directories that no longer exist. Note, though, that some errors from a site are to be expected, even if everything tests out okay elsewhere.
The important things to look out for are unexpected items, rather than missteps you might repeatedly be making. For example, say you frequently forget to add a 'favicon' to your sites; you would then get many errors with browsers looking for a file which will never exist. But errors in a CGI or other item you would want to know about.
If you are on a Unix system, and using the standard error log format, the following command generates a unique list of errors from a log file:
pulls out the most recent 100.
searches the entire file.
To monitor the significance, add '-c' to the uniq command, which will find you a count of the number of each error.
For obvious reasons, it's a good idea to keep logs for a while to track and trace problems. You'll also probably want to keep access logs for a long time to perform the necessary analysis. Error logs can be disposed of every three months once you've gone through the steps above to check out any errors or potential problems.
The most effective way of doing this in the standard Apache release (without any clever configuration tricks) is to:
This obviously shuts down Apache for a period, which you may not want to do if the server is busy. To get around this, use the piped log system, which outputs log information through an external command that can automatically rotate and archive the information. Apache, in fact, provides the rotatelogs application as part of the standard kit to do this. Rotatelogs accepts the name of the log, and the interval for rotation (in seconds).
To enable rotatelogs, change the configuration file to use the pipe system for each log file:
The number 86400 is the number of seconds in a day. On a busy site, it is preferable to decrease that value so the rotation is performed every six hours, or even every hour. On less busy sites setting it for every week or month would work.
We recommend writing all logs into a custom MySQL database. This makes it easier to get out information from both error and access logs. The fields in the SQL table match those in the output, and an extra field records the name of the Web site.
Many Web servers rarely have their configuration files modified and updated; others regularly add new configurations, virtual hosts, and other elements. Two ways to ensure the configuration is up to date and working are 1) checking the configuration and 2) tracking configuration changes.
Checking the configuration periodically highlights any problems (including any disparity between the currently running Apache and the current configuration file). Sometimes, it will even highlight changes made to the Apache configuration of which you may not have been aware.
Checking the configuration can be handled with the apachectl command with the configtest argument:
The 'Syntax OK' line at the end is the key piece.
Configuration management is about keeping a history of the changes you've made to your configuration file. The easiest way to do this is to use RCS or CVS to check in any configuration changes made. They not only track the changes and differences between versions, but also enable you to record a log of the changes made and recover previous versions.
If you're worried you will forget to log the changes, you can run a script each night to automatically check the latest version of the script, along with a suitable description to identify the automatic changes:
Security and Passwords
User life cycle management may not be one of your first thoughts during the lifetime of your Web server, but it's a critical part to keeping the system running and ensuring the security of the environment. It doesn't matter whether you are using standard HTTP authentication, an authentication system mapped to a MySQL or other database, or your own internal system; you need to keep a track of those people to whom you have granted access.
The key part of user life cycle management is to ensure their ID and access is granted only while they are actually allowed to access the system. This means regularly checking your user list and HTTP authentication systems to ensure only those users who should have access to your servers do.
One way to do this is to keep a separate log of the users added to the system and when (both manually and automatically). The moment a user leaves, remove him or her from the authentication system. Periodically, you should also go through Web site access logs to check who has been using the system recently.
There are two reasons for this. First it highlights any errors. Second, it enables the removal of users from the authentication system if they've been inactive for a reasonable period. On secure sites, such as an intranet, checks should be performed every month; on unsecured sites, checks should be run every three months. Remove anyone who hasn't used the system in that time period. They can always be granted access at a later stage if need be.
Keeping Apache Up to Date
The final component of server maintenance is monitoring the process of updating your Apache installation to the latest version. We can't emphasize enough: Never install a version of Apache on a production or live system without first testing it. To test, we keep a copy of the latest site, configuration, and other information on a VMware or VirtualPC machine; install a copy of Apache; and test the effects.
For these machines, as well as developmental and other non-critical machines, we use a directory structure to hold Apache sources during building. We keep a separate directory for each instance of an Apache server not just each version. For machines running multiple Apache instances, it's vital to have a separate installation directory and source structure for each one.
Here, is a sample structure from a main Web server that holds three instances for development, staging, and production Web sites:
There is a separate build directory for each version of Apache within each instance. One of the issues with Apache is that you must configure and then build the system using the correct options not specifying a dynamic structure or the additional modules you want can cause problems. So, within each instance directory sits a script that holds the configuration command line used to configure the Apache server instance.
For example, the script might contain:
This works with any version of Apache to configure the sources for the correct instance. If a new version comes out, just re-run the script in the new source directory. Whatever configuration options were active with the current version will then apply to the new one, no need to remember the configuration and command line options used months ago when the previous edition came out.
Other Systems and Extensions
It's unlikely Apache is your entire Web serving platform there are probably additional modules, languages, scripting environments, and other components to maintain and keep up to date (e.g., the latest versions of Perl, Perl modules, PHP, and MySQL). Keeping these up to date is not entirely a full-time job, but they should be checked every one to three months to see what needs updating.
Some of this can be quite easy. For example, if you are using the CPAN module within Perl, you can update all of the installed modules on your system using the command:
This forces CPAN to produce a list of all of the outdated modules and install them. Other systems must then be handled manually.
Some items in this article should be checked weekly, some monthly, and some annually. Each enterprise must determine which schedule is right for each item based on the environment, how busy the server is, and how well-used the system is. Most likely, the treatment that sites and virtual hosts receive will vary.
Just don't ignore maintenance in the hopes that it will go away it wont. But a few simple steps could save you hours, and even days, in the long run.