How Log Files Work
Every Web site has a different set of goals, but there’s one thing we all have in common: We want more traffic! Although a sure-fire way to build Web site traffic quickly remains as elusive as a sure-fire way to predict stock prices, there are some tried-and-true methods that can help you build your Web site traffic slowly but surely. The ambitious site owner will use various promotional tactics on an ongoing basis, but this article is not about any one traffic-building technique.
Every time a file is retrieved from a Web site, the server software keeps a record of it (assuming that logging is turned on). The server stores this information in text files, (usually with a .txt or .log extension), called the Access Log, Error Log and Referrer Log. The log files contain not only a record of which pages were requested at which times, but a good bit of information about the people (or other entities) that requested them.
As you can imagine, log files can get huge very quickly, and take up an enormous amount of expensive hard drive space at your hosting service. Therefore, most Web servers are set up to “rotate” or “cycle” the log files in some way, to make sure that all the files get saved, but that they don’t hang around on the server. A simple way to do this is to have the server automatically email a copy of the log files to somebody periodically. This lucky individual transfers them to some permanent storage location, and the server automatically purges the original log files after a certain amount of time.
If you want to have decent stats for your site, be careful about keeping your log files organized. It’s a pain in the neck, but worth it – any gap in your data can screw up your reports, and once it’s lost it’s lost.
The wealth of data in the log files is not readily mined with the naked eye. A raw log file entry looks something like this:
206.135.203.174 – – [19/Jul/1999:00:00:04 -0600] “GET /studio/drives.html HTTP/1.1” 200 20607 “http://www.webdevelopersjournal.com/studio/hard.html” “Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)”
As you can see, this entry shows what page was requested, when it was requested, where the visitor came from, and even what browser and OS they were running. As I’m sure you can also see, you won’t learn much of interest just by looking at the raw log files. There’s page after page of this stuff.
To get the most out of the data, you need to be able to see totals for the whole site, and compare the figures over time. That’s where a log analysis software package comes in. These handy tools range from Getstats (a free Unix program that can run on your Web server) to various cheap shareware options, to industrial-strength packages like Marketwave Hit List Pro 4.0 ($395 list) or WebTrends Log Analyzer 4.52($399 list).
Basic tools like Getstats can give you almost as much information as the pricey packages, but customization options are limited, and results are presented in plain text format. If you want pretty pictures and graphs for the marketing department, you’ll need something like Hit List or WebTrends. For a comparative review of these two packages, see a review from Web Developers Journal.
Get the Most From Your Log Files
Mining that Data
, an internet.com Web site.