dcsimg

Web Automation: Dynamic Directory Indexing

By Matthew Keller (Send Email)
Posted Jun 2, 2000


If you're like me, you probably loathe updating directory index pages. You add a new file or folder to your Web site and then you have to find other pages that you should link from and update them--not to mention the toils of updating all of those pages if the page name/location changes!

On a daily basis, updating directory index pages is one of the most tiresome tasks there is. But fear not: in this column, Matthew Keller explains how a Perl script can automate this task for you--as well as updating all of those pages if the page name/location changes.

I solve this problem, quite simply, by creating directory index scripts using Perl. The largest member of this class of scripts is a directory on my private Web server that has folders containing pages talking about my projects. My entire Web site is logically organized (logical to me, anyways) using directories to house and nest information, and my "projects" page is no different.

From a filesystem structure standpoint, every directory in my projects directory contains a different project. Every project directory has an index HTML file. Every index HTML file has a title. Keeping these rules in mind, it is easy to write a short Perl script that makes ones' life much easier.

Configuring Apache

This script resides in the root of the Projects folder, and is called index.pl. In order for Apache to consider index.pl the directory index script, we have to configure the httpd.conf file to include index.pl as a valid directory index file. You may choose index.cgi instead of .pl if you want. Below shows my DirectoryIndex statement. Apache reads these entries one at a time, from left to right. You will probably want to have index.html placed ahead of index.pl, if the majority of your directory index pages are HTML pages and not these handy scripts:

DirectoryIndex index.pl index.html index.php index.cgi index.htm

Regardless of what you call these scripts, make sure you let Apache know how to handle them, by using the AddHandler directive in your config file. Below is an excerpt of mine:

AddHandler cgi-script .pl .cgi

Thinking About the Problem

Recall the environment I mentioned earlier:

  1. Every directory in my projects directory contains a different project
  2. Every project directory has an index.html file
  3. Every index.html file has a title

Given this organizational structure, our little script has to do only four things:

  1. Obtain a list of directories
  2. For every directory, open the index.html file if it exists
  3. For every index.html file, extract the title of the page
  4. For every pilfered title, print it back to the user as a link to the given page

Step 1: Obtain a list of directories
A clumsy, but easy way do acquire a list of directories, is to place all of the contents of the root directory we want to index, into an array (the projects directory for our example):

1: my ="/usr/local/apache/htdocs/projects/";
2: my ="http://mattwork.potsdam.edu/projects/";

3: opendir(PRJD,"");
4: my @dirs=readdir PRJD;
5: closedir(PRJD);

Page 1 of 4


Comment and Contribute

Your name/nickname

Your email

(Maximum characters: 1200). You have characters left.