Log Analysis Basics Page 2
Converting Logs Into Useful Information
Tracking Rather than Analysis
The first step to analyzing the contents of your log files for information is picking out the real data from the log. To do this, you must understand the format. With text files, the information is normally formatted in a specific way with defined fields, using either a single character delimiter like a space or a colon, or using fixed-width fields. In addition, individual fields may also be delimited or formatted according to their content. The block below is an example from an Apache Web server:
192.168.1.59 - - [11/Feb/2004:12:21:57 +0000] "GET / HTTP/1.1" 200 11669 192.168.1.59 - - [11/Feb/2004:12:21:59 +0000] "GET /mcslp.css HTTP/1.1" 200 4828 192.168.1.59 - - [11/Feb/2004:12:21:59 +0000] "GET /weather/images/3.gif HTTP/1.1" 200 566 192.168.1.58 - - [11/Feb/2004:12:22:21 +0000] "GET /mail/index.cgi?m=v&mbox=com-mcslp-lbt&id=2532 HTTP/1.1" 200 20656 192.168.1.58 - - [11/Feb/2004:12:22:22 +0000] "GET /mcslp.css HTTP/1.1" 304 0
This example shows a mixture of text delimiters for the fields in the form of spaces as well as field delimiters to signify the date/time and URL components of the log. Here's another example, this time from syslog:
May 16 18:14:30 twinsol sm-mta: [ID 801593 mail.info] i4GHEQxG022012: from=<firstname.lastname@example.org>, size=20913, class=-30, nrcpts=1, msgid=<200405161600.i4GG06xf025868@shetland.sys-con.com>, proto=ESMTP, daemon=MTA, email@example.com [22.214.171.124] May 16 18:14:30 twinsol sm-mta: [ID 801593 mail.info] i4GHEQxG022012: to=<firstname.lastname@example.org>, delay=00:00:01, xdelay=00:00:00, mailer=cyrusv2, pri=194913, relay=localhost, dsn=2.0.0, stat=Sent
Being able to read and understand these logs helps focus your approach and provides a basis to analyze the data.
Most log analysis tools will provide a range of information, but the most common information to be reported are the basic statistics of the log information. For example, from a Web log you can obtain a list of URLs visited and a count of the number of times they were accessed. This provides useful information about the popularity of a particular page or area of your site.
If your logs provide a range of information, particularly with something like the date and time or the report, you can also use this information to generate statistics. You can, for example, monitor the access to a particular page or area of your site over a period of time, perhaps to determine the most popular times for visiting different pages of the site. Over the longer term, you can use this information to get usage statistics for the site, watching how access grows or how different parts of the site gain popularity.
Other logs provide alternative types of information and statistics. For example, I use a log processor on my syslog to generate a list of e-mail messages transferred through the machine, recording the date/time, source, and destination address. I'm not as concerned with actual statistics as I am about extracting the salient information from the log.