Contents:
Introduction
Getting and Setting Timestamps
Deleting a File
Copying or Moving a File
Recognizing Two Names for the Same File
Processing All Files in a Directory
Globbing, or Getting a List of Filenames Matching a Pattern
Processing All Files in a Directory Recursively
Removing a Directory and Its Contents
Renaming Files
Splitting a Filename into Its Component Parts
Program: symirror
Program: lst
Unix has its weak points but its file system is not one of them.
- Chris Torek
To fully understand directories, you need to be acquainted with the underlying mechanics. The following explanation is slanted towards the Unix filesystem, for whose system calls and behavior Perl's directory access routines were designed, but it is applicable to some degree to most other platforms.
A
filesystem consists of two parts: a set of data blocks where the contents of files and directories are kept, and an index to those blocks. Each entity in the filesystem has an entry in the index, be it a plain file, a directory, a link, or a special file like those in
/dev
. Each entry in the index is called an
inode
(short for
index node
). Since the index is a flat index, inodes are addressed by number.
A directory is a specially formatted file, whose inode entry marks it as a directory. A directory's data blocks contain a set of pairs. Each pair consists of the name of something in that directory and the inode number of that thing. The data blocks for /usr/bin might contain:
Name |
Inode |
---|---|
bc |
|
du |
|
nvi |
|
pine |
|
vi |
|
Every directory is like this, even the root directory ( / ). To read the file /usr/bin/vi , the operating system reads the inode for / , reads its data blocks to find the entry for /usr , reads /usr 's inode, reads its data block to find /usr/bin , reads /usr/bin 's inode, reads its data block to find /usr/bin/vi , reads /usr/bin/vi 's inode, and then reads the data from its data block.
The name in a directory entry isn't fully qualified. The file /usr/bin/vi has an entry with the name vi in the /usr/bin directory. If you open the directory /usr/bin and read entries one by one, you get filenames like patch , rlogin , and vi instead of fully qualified names like /usr/bin/patch , /usr/bin/rlogin , and /usr/bin/vi .
The inode has more than a pointer to the data blocks. Each inode also contains the type of thing it represents (directory, plain file, etc.), the size of the thing, a set of permissions bits, owner and group information, the time the thing was last modified, the number of directory entries that point to this inode, and so on.
Some operations on files change the contents of the file's data blocks; some change just the inode. For instance, appending to or truncating a file updates its inode by changing the size field. Other operations change the directory entry that points to the file's inode. Changing a file's name changes only the directory entry; it updates neither the file's data nor its inode.
Three fields in the inode structure contain the last access, change, and modification times:
atime
,
ctime
, and
mtime
. The
atime
field is updated each time the pointer to the file's data blocks is followed and the file's data is read. The
mtime
field is updated each time the file's data changes. The
ctime
field is updated each time the file's inode changes. The
ctime
is
not
creation time; there is no way under standard Unix to find a file's
creation time.
Reading a file changes its
atime
only. Changing a file's name doesn't change
atime
,
ctime
, or
mtime
because it was only the directory entry that changed (it does change the
atime
and
mtime
of the directory the file is in, though). Truncating a file doesn't change its
atime
(because we haven't read, we've just changed the size field in its directory entry), but it does change its
ctime
because we changed its size field and its
mtime
because we changed its contents (even though we didn't follow the pointer to do so).
We can access a file or directory's inode by calling the built-in function
stat
on its name. For instance, to get the inode for
/usr/bin/vi
, say:
@entry = stat("/usr/bin/vi") or die "Couldn't stat /usr/bin/vi : $!";
To get the inode for the directory /usr/bin , say:
@entry = stat("/usr/bin") or die "Couldn't stat /usr/bin : $!";
You can stat filehandles, too:
@entry = stat(INFILE) or die "Couldn't stat INFILE : $!";
The
stat
function returns a list of the values of the fields in the directory entry. If it couldn't get this information (for instance, if the file doesn't exist), it returns an empty list. It's this empty list we test for with the
or
die
construct. Be careful of using
||
die
because that throws the expression into scalar context, in which case
stat
only reports whether it worked. It doesn't return the list of values. The
_
cache referred to below will still be updated, though.
The values returned by
stat
are listed in the following table.
Element |
Abbreviation |
Description |
---|---|---|
0 |
dev |
Device number of filesystem |
1 |
ino |
Inode number (the "pointer" field) |
2 |
mode |
File mode (type and permissions) |
3 |
nlink |
Number of (hard) links to the file |
4 |
uid |
Numeric user ID of file's owner |
5 |
gid |
Numeric group ID of file's owner |
6 |
rdev |
The device identifier (special files only) |
7 |
size |
Total size of file, in bytes |
8 |
atime |
Last access time, in seconds, since the Epoch |
9 |
mtime |
Last modify time, in seconds, since the Epoch |
10 |
ctime |
Inode change time, in seconds, since the Epoch |
11 |
blksize |
Preferred block size for filesystem I/O |
12 |
blocks |
Actual number of blocks allocated |
The standard
File::stat module provides a named interface to these values. It overrides the
stat
function, so instead of returning the preceding array, it returns an object with a method for each attribute:
use File::stat; $inode = stat("/usr/bin/vi"); $ctime = $inode->ctime; $size = $inode->size;
In addition, Perl provides a set of operators that call
stat
and return one value only. These are collectively referred to as the
-X operators because they all take the form of a dash followed by a single character. They're modelled on the shell's
test
operators:
-X |
Stat field |
Meaning |
---|---|---|
|
mode |
File is readable by effective UID/GID |
|
mode |
File is writable by effective UID/GID |
|
mode |
File is executable by effective UID/GID |
|
mode |
File is owned by effective UID |
|
|
|
|
mode |
File is readable by real UID/GID |
|
mode |
File is writable by real UID/GID |
|
mode |
File is executable by real UID/GID |
|
mode |
File is owned by real UID |
|
|
|
|
|
File exists |
|
size |
File has zero size |
|
size |
File has nonzero size (returns size) |
|
|
|
|
mode,rdev |
File is a plain file |
|
mode,rdev |
File is a directory |
|
mode |
File is a symbolic link |
|
mode |
File is a named pipe (FIFO) |
|
mode |
File is a socket |
|
rdev |
File is a block special file |
|
rdev |
File is a character special file |
|
rdev |
Filehandle is opened to a tty |
|
|
|
|
mode |
File has setuid bit set |
|
mode |
File has setgid bit set |
|
mode |
File has sticky bit set |
|
|
|
|
N/A |
File is a text file |
|
N/A |
File is a binary file (opposite of |
|
|
|
|
mtime |
Age of file in days when script started |
|
atime |
Same for access time |
|
ctime |
Same for inode change time (not creation) |
The
stat
and the
-X
operators cache the values that the
stat
(2) system call returned. If you then call
stat
or a
-X
operator with the special filehandle
_
(a single underscore), it won't call
stat
again but will instead return information from its cache. This lets you test many properties of a single file without calling
stat
(2) many times or introducing a race condition:
open( F, "< $filename" ) or die "Opening $filename: $!\n"; unless (-s F && -T _) { die "$filename doesn't have text in it.\n"; }
The
stat
call just returns the information in one inode, though. How do we get a list of the contents of a directory? For that, Perl provides
opendir
,
readdir
, and
closedir
:
opendir(DIRHANDLE, "/usr/bin") or die "couldn't open /usr/bin : $!"; while ( defined ($filename = readdir(DIRHANDLE)) ) { print "Inside /usr/bin is something called $filename\n"; } closedir(DIRHANDLE);
These directory reading functions are designed to look like the file open and close functions. Where
open
takes a filehandle, though,
opendir
takes a directory handle. They look the same (a bare word) but they are different: you can
open(BIN,
"/a/file")
and
opendir(BIN,
"/a/dir")
and Perl won't get confused. You might, but Perl won't. Because filehandles and directory handles are different, you can't use the < > operator to read from a directory handle.
The filenames in a directory aren't necessarily stored alphabetically. If you want to get an alphabetical list of files, you'll have to read all the entries and sort them yourself.
The separation of directory information from inode information can create some odd situations. Operations that change directory only require write permission on the directory, not on the file. Most operations that change information in the file's data require write permission to the file. Operations that alter the permissions of the file require that the caller be the file's owner or the superuser. This can lead to the interesting situation of being able to delete a file you can't read, or write to a file you can't remove.
Although these situations make the filesystem structure seem odd at first, they're actually the source of much of Unix's power. Links, two filenames that refer to the same file, are now extremely simple. The two directory entries just list the same inode number. The inode structure includes a count of the number of directory entries referring to the file (
nlink
in the values returned by
stat
), but it lets the operating system store and maintain only one copy of the modification times, size, and other file attributes. When one directory entry is
unlink
ed, data blocks are only deleted if the directory entry was the last one that referred to the file's inode - and no processes still have the file open. You can
unlink
an open file, but its disk space won't be released until the last close.
Links come in two forms. The kind described above, where two directory entries list the same inode number (like
vi
and
nvi
in the earlier table), are called
hard links
. The operating system cannot tell the first directory entry of a file (the one created when the file was created) from any subsequent hard links to it. The other kind,
soft
or
symbolic links
, are very different. A soft link is a special type of file whose data block stores the filename the file is linked to. Soft links have a different
mode
value, indicating they're not regular files. The operating system, when asked to
open
a soft link, instead opens the filename contained in the data block.
Filenames are kept in a directory, separate from the size, protections, and other metadata kept in an inode.
The
stat
function returns the inode information (metadata).
opendir
,
readdir
, and friends provide access to filenames in a directory through a
directory handle
.
Directory handles look like filehandles, but they are not the same. In particular, you can't use < > on directory handles.
The permissions on a directory determine whether you can read and write the list of filenames. The permissions on a file determine whether you can change the file's metadata or contents.
Three different times are stored in an inode. None of them is the file's creation time.