You'd like your programs to work on files with funny formats, such as compressed files or remote web documents specified with a URL, but your program only knows how to access regular text in local files.
Take advantage of Perl's easy pipe handling by changing your input files' names to pipes before opening them.
To autoprocess gzipped or compressed files by decompressing them with gzip , use:
@ARGV = map { /\.(gz|Z)$/ ? "gzip -dc $_ |" : $_ } @ARGV; while (<>) { # ....... }
To fetch URLs before processing them, use the GET program from LWP (see Chapter 20, Web Automation ):
@ARGV = map { m#^\w+://# ? "GET $_ |" : $_ } @ARGV; while (<>) { # ....... }
You might prefer to fetch just the text, of course, not the HTML. That just means using a different command, perhaps lynx -dump .
As shown in
Recipe 16.1
, Perl's built-in
open
function is magical: you don't have to do anything special to get Perl to open a pipe instead of a file. (That's why it's sometimes called
magic open
and, when applied to implicit ARGV processing,
magic ARGV
.) If it looks like a pipe, Perl will open it like a pipe. We take advantage of this by rewriting certain filenames to include a decompression or other preprocessing stage. For example, the file
"09tails.gz"
becomes
"gzcat
-dc
09tails.gz|"
.
This technique has further applications. Suppose you wanted to read
/etc/passwd
if the machine isn't using NIS, and the output of
ypcat passwd
if it is. You'd use the output of the
domainname
program to decide if you're running NIS, and then set the filename to open to be either
"<
/etc/passwd"
or
"ypcat
passwd|"
:
$pwdinfo = `domainname` =~ /^(\(none\))?$/ ? '< /etc/passwd' : 'ypcat passwd |'; open(PWD, $pwdinfo) or die "can't open $pwdinfo: $!";
The wonderful thing is that even if you didn't think to build such processing into your program, Perl already did it for you. Imagine a snippet of code like this:
print "File, please? "; chomp($file = <>); open (FH, $file) or die "can't open $file: $!";
The user can enter a regular filename - or something like
"webget
http://www.perl.com
|"
instead - and your program would suddenly be reading from the output of some
webget
program. They could even enter -, a lone minus sign, which, when opened for reading, interpolates standard input instead.
This also comes in handy with the automatic ARGV processing we saw in Recipe 7.7 .
Copyright © 2001 O'Reilly & Associates. All rights reserved.