Recipe 20.14. Program: htmlsub (Perl Cookbook)

20.14. Program: htmlsub

This program makes substitutions in HTML files so that the changes only happen in normal text. If you had the file index.html that contained:

<HTML><HEAD><TITLE>Hi!</TITLE></HEAD><BODY> <H1>Welcome to Scooby World!</H1> I have <A HREF="pictures.html">pictures</A> of the crazy dog himself.  Here's one!<P> <IMG SRC="scooby.jpg" ALT="Good doggy!"><P> <BLINK>He's my hero!</BLINK>  I would like to meet him some day, and get my picture taken with him.<P> P.S. I am deathly ill.  <A HREF="shergold.html">Please send cards</A>. </BODY></HTML>

You can use htmlsub change every occurrence of the word "picture" in the document text to read "photo". It prints the new document on STDOUT:

% htmlsub picture photo scooby.html 



<HTML><HEAD><TITLE>Hi!</TITLE></HEAD><BODY>



 



<H1>Welcome to Scooby World!</H1>



 



I have <A HREF="pictures.html">photos</A> of the crazy dog



 



himself.  Here's one!<P>



 



<IMG SRC="scooby.jpg" ALT="Good doggy!"><P>



 



<BLINK>He's my hero!</BLINK>  I would like to meet him some day,



 



and get my photo taken with him.<P>



 



P.S. I am deathly ill.  <A HREF="shergold.html">Please send



 



cards</A>.



 



</BODY></HTML>

The program is shown in Example 20.11 .

Example 20.11: htmlsub

#!/usr/bin/perl -w # htmlsub - make substitutions in normal text of HTML files # from Gisle Aas <[email protected]>  sub usage { die "Usage: $0 <from> <to> <file>...\n" }  my $from = shift or usage; my $to   = shift or usage; usage unless @ARGV;  # Build the HTML::Filter subclass to do the substituting.  package MyFilter; require HTML::Filter; @ISA=qw(HTML::Filter); use HTML::Entities qw(decode_entities encode_entities);  sub text {    my $self = shift;    my $text = decode_entities($_[0]);    $text =~ s/\Q$from/$to/go;       # most important line    $self->SUPER::text(encode_entities($text)); }  # Now use the class.  package main; foreach (@ARGV) {     MyFilter->new->parse_file($_); }


20.13. Processing Server Logs		20.15. Program: hrefsub