Logging Directly to a Database or Application
Many people will take their logs and, using various techniques,
reformat the information into a more useful layout. There are a lots
of ways of doing this, and a number of tools, like analog, simplify the process.
Contents Logging in Apache Creating Custom Log Formats Conditional Logging Subdividing the Logs Logging Directly to a Database or Application Speeding up the Logging Process |
For large installations with a high number of servers
or sites, it can be more practical to write the information into
a database, which is then used to report on the information
directly. Running an SQL query to pick out the number of hits for a
given URI is quicker than parsing a 20 MB text file and picking
out the information.
There are three ways you can achieve this: pipes, third-party
modules, and post-processing. The first uses a pipe method to a
log directive and then uses a script to directly parse the
information and insert it into a database. For example, the
following line would write a log entry in the common custom format
to an SQL database through a script called apachetosql:
CustomLog "|/usr/local/apachetosql" common |
The script would work just like a post-processor, reading the line
Apache sent to it, extracting the fields, and then writing a
suitable INSERT query to add the line to the database. If you are
going to use this method, consider using the %v custom format to
write the virtual host name into the log entry so accesses to a specific site are trackable.
The only issue with the piped method is the security and additional overhead required to process the contents. The script or application used is another load on your server, and it needs to
be error proof. The script will also be executed
A more extensive solution is provided by modules such as the mod_log_sql
module. This module automatically inserts the log information into a
suitable table within a MySQL database. Unlike the pipe method, it
uses a direct connection to the MySQL socket, reducing the overhead
required to process the information. Because it’s built into Apache,
an extra process is not required; nor do we need to worry about the security
of the script.
With the two direct methods, the main issue is the availability
of the SQL server itself. Any error in the availability of
the SQL server runs the risk of information being lost. Even with
the module method, additional processing is required to log
the information. That means either additional local processing or
network bandwidth on the log server.
The post-processing method allows continued logging to
standard text files. You can later import the log data into your
MySQL or other database, without worrying about the concurrent
processing overhead or connectivity issues. Then, if anything goes wrong,
the text files are there to to fall back on.
Speeding up the Logging Process
Although for most users logging is a critical part of the monitoring process, it can put a burden on machines serving a
large number of sites and virtual hosts as well as particularly busy Web servers. In theory, the logging
process shouldn’t cause too much of a problem, but for those
worried about its effects on the server, a
few tricks are available.
- Make sure you are tracking only those files on which you
will later want to report. This limits the number of lines and
data reported. Get your file selections right, because you can’t go back and get the information at
a later date! - Switch off hostname lookups in log data. With lookups
switched off, Apache will record only the IP address, and these are easily resolved into hostnames at a later stage. To disable, use the HostnameLookups directive with the option Off. - Unless you absolutely need it, leave the IdentityCheck
directive off. This prevents the log from containing
validated identity information for users logged in
using the HTTP authentication system. Checking the
information is time consuming, so switching it off (or
leaving it off, since this is the default) should help to
reduce the load. - Use a single log file for all virtual hosts. This
limits the number of open files within Apache used for
logging. You can split up the files later using the
split-logfile program supplied with Apache. To enable a
single central log file, omit logging directives from
within the VirtualHost directives and specify a custom
access log format starting with the %v pattern, which
inserts the virtual host name in each line of log. - Unless you really need the information, create only an
access log and an error log. Referrer logs, cookie logs, and
other information is generally useless on a production
server. On a test server, have as many logs as you want,
provided you are not testing performance!
While these techniques will not guarantee a massive improvement in performance, they will make a difference, and it’s always important to strike a balance between the amount of
information needed and the effect on performance to achieve it
right. Log everything, and you’ll waste time and disk space; don’t
record enough, and you may run foul of your marketing department and enterprise regulatory requirements.