And finally I want to cover the pitfall many people has fallen into. Let’s
talk about regular expressions used under mod_perl.
When using a regular expression that contains an interpolated Perl
variable, if it is known that the variable (or variables) will not change
during the execution of the program, a standard optimization technique is to
add the /o
modifier to the regex pattern. This directs the
compiler to build the internal table once, for the entire lifetime of the
script, rather than every time the pattern is executed. Consider:
my = '^foo$'; # likely to be input from an HTML form field foreach( @list ) { print if //o; }
This is usually a big win in loops over lists, or when using the
grep()
or map()
operators.
In long-lived mod_perl scripts, however, the variable may change with each
invocation and this can pose a problem. The first invocation of a fresh httpd
child will compile the regex and perform the search correctly. However, all
subsequent uses by that child will continue to match the original pattern,
regardless of the current contents of the Perl variables the pattern is
supposed to depend on. Your script will appear to be broken.
There are two solutions to this problem:
The first is to use eval q//
, to force the code to be
evaluated each time. Just make sure that the eval block covers the entire loop
of processing, and not just the pattern match itself.
The above code fragment would be rewritten as:
my = '^foo$'; eval q{ foreach( @list ) { print if //o; } }Just saying:
foreach( @list ) { eval q{ print if //o; }; }means that I recompile the regex for every element in the list even though
the regex doesn't change.You can use this approach if you require more than one pattern match
operator in a given section of code. If the section contains only one operator
(be it anm//
ors///
), you can rely on the property
of the null pattern, that reuses the last pattern seen. This leads to the
second solution, which also eliminates the use of eval.The above code fragment becomes:
my = '^foo$'; "something" =~ //; # dummy match (MUST NOT FAIL!) foreach( @list ) { print if //; }The only gotcha is that the dummy match that boots the regular expression
engine must absolutely, positively succeed, otherwise the pattern will not be
cached, and the//
will match everything. If you can't count on
fixed text to ensure the match succeeds, you have two possibilities.If you can guarantee that the pattern variable contains no meta-characters
(things like *, +, ^, $...), you can use the dummy match:=~ /QE/; # guaranteed if no meta-characters presentIf there is a possibility that the pattern can contain meta-characters, you
should search for the pattern or the non-searchable 377 character as follows:"377" =~ /|^377$/; # guaranteed if meta-characters presentAnother approach:
It depends on the complexity of the regex to which you apply this
technique. One common usage where a compiled regex is usually more efficient is
to ''match any one of a group of patterns'' over and over again.Maybe with a helper routine, it's easier to remember. Here is one slightly
modified from Jeffery Friedl's example in his book Mastering Regular
Expressions: