Web Automation: Generating Dynamic Tables of Contents Page 2

Line 1 starts the Get_Dirs function. Line 2 stuffs the argument (path of the "base" directory) into the scalar. Line 3 opens the base directory and assigns the GD handle to it. Line 4 creates an array called @DIRS for later use. Line 5 starts looping through the contents of the base directory. Line 6 assigns the item to the scalar . Line 7 essentially says, "if this item begins with a dot, then skip it." Line 8 checks to make sure that the item is question is a directory. Line 9 adds the item to the @DIRS array if it is a directory (as determined in the previous line). Line 10 ends the previous If statement. Line 11 ends the For loop. Line 12 closes the directory handle. Line 13 returns the contents of the @DIRS array.

Function 2: Given the path to an HTML page, extract and return its title

Here we have the same Get_Title function from the last script. This function takes an HTML filename as an argument and returns the title if one is found:

1: sub Get_Title {
2: my $filename=shift;
3: unless(-f "$filename") { return("NO INDEX"); }
4: open(HTML,"<$filename");
5: while(<HTML>){
6: if($_=~ /<title>(.*)<\/title>/i) {
7: close HTML;
8: return "$1";
9: }
10: }
11: close HTML;
12: return "Untitled";
13: }

Don't let this snippet scare you; it's actually quite logical once dissected. Line 1 declares the function Get_Title. Line 2 takes the parameter we passed to the function (that's the name of the HTML file), and shifts it into the scalar variable . Line 3 says, "unless this is a file, return the text 'NO INDEX'." Line 4 opens the file for reading and assigns the handle HTML to it. Line 5 begins a while iteration over every line of the open file (every line will cause a new iteration of the loop, the contents of the line will be stored in the special variable sh). Line 6 says, "if this line contains a <title> and a </title> place the stuff in between in the special variable and continue inside the brackets." Line 7 is inside the if statement and closes the HTML file. Line 8 returns the text of the title and exits the function. Line 9 ends the if statement. Line 10 ends the while statement. Line 11 will close the HTML file if no title has been found. Line 12 will return the word Untitled in the advent that no title has been found. Line 13 ends the function. This function is a bit complex in code, but I like how it demonstrates a lot of Perl's power and flexibility. The if statement in line 6 contains a regular expression that it's case-insensitive (note the i after the last /), so that different capitalizations all appear the same to the if).

Function 3: Given any path, judge its "depth"

For the purposes of this example, I'm defining "depth" as the number of forward-slashes. This function takes a path, and returns the number of fore-slashes:

1: sub Get_Depth {
2: $_ = shift;
3: return tr/\///;
4: }

Line 1 begins the function Get_Depth. Line 2 stores the passed argument (the path in question) in the special variable sh. Line 3 uses a transliteration regular expression to count the number of fore-slashes, and return it. Line 4 ends the function.

This article was originally published on Jun 21, 2000

Thanks for your registration, follow us on our social networks to keep up-to-date