Monday, September 14, 2009

Perltweak: fast and easy matching text with index()

The best tool in Perl for finding exact strings in another string (scalar) is not the match operator m//, but the much faster index() function. Use it whenever the text you are looking for is straight text. Whenever you don't need additional metanotation like "at the beginning of the string" or "any character," use index():

$index = index($T, $P); # T is the text, P is the pattern.

The returned $index is the index of the start of the first occurrence of $p in the $T. The first character of $T is at index 0. If the $P cannot be found, -1 is returned.

If you want to skip early occurrences of $P and start later in $T, use the three-argument version:

$index = index($T, $P, $start_index);

If you need to find the last occurrence of the $p, use rindex(), which begins at the end of the string and proceeds leftward.

If you do need to specify information beyond the text itself, use regular expressions.

Why do I tell you this?
Large parts of Serversniff use perl for its backend - be it the site-analyzer or the domain-database. Like most of us I never really learned perl - i was thrown right into a project using eperl and and a bulletin-board-system based on perl and had to maintain and evolve the projects code out nothing. While have used I use perl since then for more than 10 years now I still do find simple tweaks making my perl-life easier almost every week.
Thanks to O'Reilly's "Mastering Algorithms with Perl" for this one.

tom