You want to read a file whose records have a fixed length.
Use
read
and
unpack
:
# $RECORDSIZE is the length of a record, in bytes. # $TEMPLATE is the unpack template for the record # FILE is the file to read from # @FIELDS is an array, one element per field until ( eof(FILE) ) { read(FILE, $record, $RECORDSIZE) == $RECORDSIZE or die "short read\n"; @FIELDS = unpack($TEMPLATE, $record); }
Because the file in question is not a text file, you can't use
<FH>
or IO:: modules'
getline
method to read in records. Instead, you must simply
read
a particular number of bytes into a buffer. This buffer then contains one record's data, which you decode using
unpack
with the right format.
For binary data, the catch is often determining the right format. If you're reading data written by a C program, this can mean peeking at C include files or manpages describing the structure layout, and this requires knowledge of C. It also requires that you become unnaturally chummy with your C compiler, because otherwise it's hard to predict field padding and alignment (such as the
x2
in the format used in
Recipe 8.18
). If you're lucky enough to be on a Berkeley Unix system or a system supporting
gcc
, then you may be able to use the
c2ph
tool distributed with Perl to cajole your C compiler into helping you with this.
The tailwtmp program at the end of this chapter uses the format described in utmp (5) under Linux and works on its /var/log/wtmp and /var/run/utmp files. Once you commit to working in a binary format, machine dependencies creep in fast. It probably won't work unaltered on your system, but the procedure is still illustrative. Here is the relevant layout from the C include file on Linux:
#define UT_LINESIZE 12 #define UT_NAMESIZE 8 #define UT_HOSTSIZE 16 struct utmp { /* here are the pack template codes */ short ut_type; /* s for short, must be padded */ pid_t ut_pid; /* i for integer */ char ut_line[UT_LINESIZE]; /* A12 for 12-char string */ char ut_id[2]; /* A2, but need x2 for alignment */ time_t ut_time; /* l for long */ char ut_user[UT_NAMESIZE]; /* A8 for 8-char string */ char ut_host[UT_HOSTSIZE]; /* A16 for 16-char string */ long ut_addr; /* l for long */ };
Once you figure out the binary layout, feed that (in this case,
"s
x2
i
A12
A2
x2
l
A8
A16
l"
) to
pack
with an empty field list to determine the record's size. Remember to check the return value of
read
when you read in your record to make sure you got back the number of bytes you asked for.
If your records are text strings, use the
"a"
or
"A"
unpack templates.
Fixed-length records are useful in that the
n
th record begins at byte offset
SIZE
*
(
n
-1)
in the file, where
SIZE
is the size of a single record. See the indexing code in
Recipe 8.8
for an example of this.
The
unpack
,
pack
, and
read
functions in
perlfunc
(1) and in
Chapter 3
of
Programming Perl
;
Recipe 1.1