How Log Files Work Page 2
How Log Files Work
Every time a file is retrieved from a Web site, the server software keeps a record of it (assuming that logging is turned on). The server stores this information in text files, (usually with a .txt or .log extension), called the Access Log, Error Log and Referrer Log. The log files contain not only a record of which pages were requested at which times, but a good bit of information about the people (or other entities) that requested them.
As you can imagine, log files can get huge very quickly, and take up an enormous amount of expensive hard drive space at your hosting service. Therefore, most Web servers are set up to "rotate" or "cycle" the log files in some way, to make sure that all the files get saved, but that they don't hang around on the server. A simple way to do this is to have the server automatically email a copy of the log files to somebody periodically. This lucky individual transfers them to some permanent storage location, and the server automatically purges the original log files after a certain amount of time.
If you want to have decent stats for your site, be careful about keeping your log files organized. It's a pain in the neck, but worth it - any gap in your data can screw up your reports, and once it's lost it's lost.
The wealth of data in the log files is not readily mined with the naked eye. A raw log file entry looks something like this:
18.104.22.168 - - [19/Jul/1999:00:00:04 -0600] "GET /studio/drives.html HTTP/1.1" 200 20607 "http://www.webdevelopersjournal.com/studio/hard.html" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"
As you can see, this entry shows what page was requested, when it was requested, where the visitor came from, and even what browser and OS they were running. As I'm sure you can also see, you won't learn much of interest just by looking at the raw log files. There's page after page of this stuff.
To get the most out of the data, you need to be able to see totals for the whole site, and compare the figures over time. That's where a log analysis software package comes in. These handy tools range from Getstats (a free Unix program that can run on your Web server) to various cheap shareware options, to industrial-strength packages like Marketwave Hit List Pro 4.0 ($395 list) or WebTrends Log Analyzer 4.52 ($399 list).
Basic tools like Getstats can give you almost as much information as the pricey packages, but customization options are limited, and results are presented in plain text format. If you want pretty pictures and graphs for the marketing department, you'll need something like Hit List or WebTrends. For a comparative review of these two packages, see a review from Web Developers Journal.
Mining that Data
, an internet.com Web site.