Advanced Logging Techniques With Apache Page 4
Many people will take their logs and, using various techniques, reformat the information into a more useful layout. There are a lots of ways of doing this, and a number of tools, like analog, simplify the process.
Logging in Apache
Creating Custom Log Formats
Subdividing the Logs
Logging Directly to a Database or Application
Speeding up the Logging Process
For large installations with a high number of servers or sites, it can be more practical to write the information into a database, which is then used to report on the information directly. Running an SQL query to pick out the number of hits for a given URI is quicker than parsing a 20 MB text file and picking out the information.
There are three ways you can achieve this: pipes, third-party modules, and post-processing. The first uses a pipe method to a log directive and then uses a script to directly parse the information and insert it into a database. For example, the following line would write a log entry in the common custom format to an SQL database through a script called apachetosql:
CustomLog "|/usr/local/apachetosql" common
The script would work just like a post-processor, reading the line Apache sent to it, extracting the fields, and then writing a suitable INSERT query to add the line to the database. If you are going to use this method, consider using the %v custom format to write the virtual host name into the log entry so accesses to a specific site are trackable.
The only issue with the piped method is the security and additional overhead required to process the contents. The script or application used is another load on your server, and it needs to be error proof. The script will also be executed
A more extensive solution is provided by modules such as the mod_log_sql module. This module automatically inserts the log information into a suitable table within a MySQL database. Unlike the pipe method, it uses a direct connection to the MySQL socket, reducing the overhead required to process the information. Because it's built into Apache, an extra process is not required; nor do we need to worry about the security of the script.
With the two direct methods, the main issue is the availability of the SQL server itself. Any error in the availability of the SQL server runs the risk of information being lost. Even with the module method, additional processing is required to log the information. That means either additional local processing or network bandwidth on the log server.
The post-processing method allows continued logging to standard text files. You can later import the log data into your MySQL or other database, without worrying about the concurrent processing overhead or connectivity issues. Then, if anything goes wrong, the text files are there to to fall back on.
Although for most users logging is a critical part of the monitoring process, it can put a burden on machines serving a large number of sites and virtual hosts as well as particularly busy Web servers. In theory, the logging process shouldn't cause too much of a problem, but for those worried about its effects on the server, a few tricks are available.
- Make sure you are tracking only those files on which you will later want to report. This limits the number of lines and data reported. Get your file selections right, because you can't go back and get the information at a later date!
- Switch off hostname lookups in log data. With lookups switched off, Apache will record only the IP address, and these are easily resolved into hostnames at a later stage. To disable, use the HostnameLookups directive with the option Off.
- Unless you absolutely need it, leave the IdentityCheck directive off. This prevents the log from containing validated identity information for users logged in using the HTTP authentication system. Checking the information is time consuming, so switching it off (or leaving it off, since this is the default) should help to reduce the load.
- Use a single log file for all virtual hosts. This limits the number of open files within Apache used for logging. You can split up the files later using the split-logfile program supplied with Apache. To enable a single central log file, omit logging directives from within the VirtualHost directives and specify a custom access log format starting with the %v pattern, which inserts the virtual host name in each line of log.
- Unless you really need the information, create only an access log and an error log. Referrer logs, cookie logs, and other information is generally useless on a production server. On a test server, have as many logs as you want, provided you are not testing performance!
While these techniques will not guarantee a massive improvement in performance, they will make a difference, and it's always important to strike a balance between the amount of information needed and the effect on performance to achieve it right. Log everything, and you'll waste time and disk space; don't record enough, and you may run foul of your marketing department and enterprise regulatory requirements.