start page | rating of books | rating of authors | reviews | copyrights

Perl Cookbook

Perl CookbookSearch this book
Previous: 6.20. Matching Abbreviations Chapter 6
Pattern Matching
Next: 6.22. Program: tcgrep
 

6.21. Program: urlify

This program puts HTML links around URLs in files. It doesn't work on all possible URLs, but does hit the most common ones. It tries hard to avoid including end-of-sentence punctuation in the marked-up URL.

It is a typical Perl filter, so it can be used by feeding it input:

% gunzip -c ~/mail/archive.gz | urlify > archive.urlified

or by supplying files on the command line:

% urlify ~/mail/*.inbox > ~/allmail.urlified

The program is shown in Example 6.13 .

Example 6.13: urlify

#!/usr/bin/perl # urlify - wrap HTML links around URL-like constructs  $urls = '(http|telnet|gopher|file|wais|ftp)'; $ltrs = '\w'; $gunk = '/#~:.?+=&%@!\-'; $punc = '.:?\-'; $any  = "${ltrs}${gunk}${punc}";  while (<>) {     s{       \b                    # start at word boundary       (                     # begin $1  {        $urls     :          # need resource and a colon        [$any] +?            # followed by on or more                             #  of any valid character, but                             #  be conservative and take only                             #  what you need to....       )                     # end   $1  }       (?=                   # look-ahead non-consumptive assertion        [$punc]*             # either 0 or more punctuation        [^$any]              #   followed by a non-url char        |                    # or else        $                    #   then end of the string       )      }{<A HREF="$1">$1</A>}igox;     print; }










Previous: 6.20. Matching Abbreviations Perl Cookbook Next: 6.22. Program: tcgrep
6.20. Matching Abbreviations Book Index 6.22. Program: tcgrep

Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.