You want to write a program that takes a list of filenames on the command line and reads from STDIN if no filenames were given. You'd like the user to be able to give the file
"-"
to indicate STDIN or
"someprogram
|"
to indicate the output of another program. You might want your program to modify the files in place or to produce output based on its input.
iscussion
When you say:
while (<>) { # ... }
Perl translates this into:[ 4 ]
[4] Except that the code written here won't work because ARGV has internal magic.
unshift(@ARGV, '-') unless @ARGV; while ($ARGV = shift @ARGV) { unless (open(ARGV, $ARGV)) { warn "Can't open $ARGV: $!\n"; next; } while (defined($_ = <ARGV>)) { # ... } }
You can access
ARGV
and
$ARGV
inside the loop to read more from the filehandle or to find the filename currently being processed. Let's look at how this works.
If the user supplies no arguments, Perl sets
@ARGV
to a single string,
"-"
. This is shorthand for STDIN when opened for reading and STDOUT when opened for writing. It's also what lets the user of your program specify
"-"
as a filename on the command line to read from STDIN.
Next, the file processing loop removes one argument at a time from
@ARGV
and copies the filename into the global variable
$ARGV
. If the file cannot be opened, Perl goes on to the next one. Otherwise, it processes a line at a time. When the file runs out, the loop goes back and opens the next one, repeating the process until
@ARGV
is exhausted.
The
open
statement didn't say
open(ARGV,
"<
$ARGV")
. There's no extra greater- than symbol supplied. This allows for interesting effects, like passing the string
"gzip
-dc
file.gz
|"
as an argument, to make your program read the output of the command
"gzip
-dc
file.gz"
. See Recipe 16.6 for more about this use of magic open.
You can change
@ARGV
before or inside the loop. Let's say you don't want the default behavior of reading from STDIN if there aren't any arguments - you want it to default to all the C or C++ source and header files. Insert this line before you start processing
<ARGV>
:
@ARGV = glob("*.[Cch]") unless @ARGV;
Process options before the loop, either with one of the Getopt libraries described in Chapter 15, User Interfaces , or manually:
# arg demo 1: Process optional -c flag if (@ARGV && $ARGV[0] eq '-c') { $chop_first++; shift; } # arg demo 2: Process optional -NUMBER flag if (@ARGV && $ARGV[0] =~ /^-(\d+)$/) { $columns = $1; shift; } # arg demo 3: Process clustering -a, -i, -n, or -u flags while (@ARGV && $ARGV[0] =~ /^-(.+)/ && (shift, ($_ = $1), 1)) { next if /^$/; s/a// && (++$append, redo); s/i// && (++$ignore_ints, redo); s/n// && (++$nostdout, redo); s/u// && (++$unbuffer, redo); die "usage: $0 [-ainu] [filenames] ...\n"; }
Other than its implicit looping over command-line arguments,
<>
is not special. The special variables controlling I/O still apply; see
Chapter 8
for more on them. You can set
$/
to set the line terminator, and
$.
contains the current line (record) number. If you undefine
$/
, you don't get the concatenated contents of all files at once; you get one complete file each time:
undef $/; while (<>) { # $_ now has the complete contents of # the file whose name is in $ARGV }
If you localize
$/
, the old value is automatically restored when the enclosing block exits:
{ # create block for local local $/; # record separator now undef while (<>) { # do something; called functions still have # undeffed version of $/ } } # $/ restored here
Because processing
<ARGV>
never explicitly closes filehandles, the record number in
$.
is not reset. If you don't like that, you can explicitly close the file yourself to reset
$.
:
while (<>) { print "$ARGV:$.:$_"; close ARGV if eof; }
The
eof
function
defaults to checking the end of file status of the last file read. Since the last handle read was ARGV,
eof
reports whether we're at the end of the current file. If so, we close it and reset the
$.
variable. On the other hand, the special notation
eof()
with parentheses but no argument checks if we've reached the end of all files in the
<ARGV>
processing.
Perl has command-line options, -n , -p , and -i , to make writing filters and one-liners easier.
The
-n
option adds the
while
(<>)
loop around your program text. It's normally used for filters like
grep
or programs that summarize the data they read. The program is shown in
Example 7.1
.
#!/usr/bin/perl # findlogin1 - print all lines containing the string "login" while (<>) {# loop over files on command line print if /login/; }
The program in Example 7.1 could be written as shown in Example 7.2 .
#!/usr/bin/perl -n # findlogin2 - print all lines containing the string "login" print if /login/;
You can combine the -n and -e options to run Perl code from the command line:
% perl -ne 'print if /login/'
The
-p
option is like
-n
but it adds a
print
at the end of the loop. It's normally used for programs that translate their input. This program is shown in
Example 7.3
.
#!/usr/bin/perl # lowercase - turn all lines into lowercase use locale; while (<>) { # loop over lines on command line s/([^\W0-9_])/\l$1/g; # change all letters to lowercase print; }
The program in Example 7.3 could be written as shown in Example 7.4 .
#!/usr/bin/perl -p # lowercase - turn all lines into lowercase use locale; s/([^\W0-9_])/\l$1/g;# change all letters to lowercase
Or written from the command line as:
% perl -Mlocale -pe 's/([^\W0-9_])/\l$1/g'
While using
-n
or
-p
for implicit input looping, the special label
LINE:
is silently created for the whole input loop. That means that from an inner loop, you can go on the following input record by using
next
LINE
(this is like
awk
's
next
). Go on to the file by closing ARGV (this is like
awk
's
nextfile
). This is shown in
Example 7.5
.
#!/usr/bin/perl -n # countchunks - count how many words are used. # skip comments, and bail on file if __END__ # or __DATA__ seen. for (split /\W+/) { next LINE if /^#/; close ARGV if /__(DATA|END)__/; $chunks++; } END { print "Found $chunks chunks\n" }
The tcsh keeps a .history file in a format such that every other line contains a commented out timestamp in Epoch seconds:
#+0894382237 less /etc/motd #+0894382239 vi ~/.exrc #+0894382242 date #+0894382242 who #+0894382288 telnet home
A simple one-liner can render that legible:
% perl -pe 's/^#\+(\d+)\n/localtime($1) . " "/e'
Tue May 5 09:30:37 1998 less /etc/motd
Tue May 5 09:30:39 1998 vi ~/.exrc
Tue May 5 09:30:42 1998 date
Tue May 5 09:30:42 1998 who
Tue May 5 09:31:28 1998 telnet home
The -i option changes each file on the command line. It is described in Recipe 7.9 , and is normally used in conjunction with -p .
perlrun (1), and the "Switches" section of Chapter 6 of Programming Perl ; Recipe 7.9 ; Recipe 16.6