In older versions of Perl, a user could call dbmopen to tie a hash to a UNIX DBM file. Whenever the hash was accessed, the database file on disk (really just a hash, not a full relational database) would be magically[ 16 ] read from or written to. In modern versions of Perl, you can bind any ordinary variable (scalar, array, or hash) to an implementation class by using tie . (The class may or may not implement a DBM file.) You can break this association with untie .
[16] In this case, magically means "transparently doing something very complicated". You know the old saying - any technology sufficiently advanced is indistinguishable from a Perl script.
The tie function creates the association by creating an object internally to represent the variable to the class. If you have a tied variable, but want to get at the underlying object, there are two ways to do it. First, the tie function returns a reference to the object. But if you didn't bother to store that object reference anywhere, you could still retrieve it using the tied function.
$object = tieVARIABLE
,CLASSNAME
,LIST
untieVARIABLE
$object = tiedVARIABLE
The
tie
function binds the variable to the class package that provides the methods for that variable. Once this magic has been performed, accessing a tied variable automatically triggers method calls in the proper class. All the complexity of the class is hidden behind magic method calls. The method names are predetermined, since they're called implicitly from within the innards of Perl. These names are in
ALL CAPS
, which is a convention in Perl culture that indicates that the routines are called implicitly rather than explicitly - just like
BEGIN
,
END
, and
DESTROY
. And
AUTOLOAD
too, for that matter.
You can almost think of
tie
as a funny kind of
bless
, except that it blesses a bare variable instead of a thingy reference, and takes extra parameters, like a constructor. That's because it actually does call a constructor internally. (That's one of the magic methods we mentioned.) This constructor is passed the
CLASSNAME
you specified, as well as any additional arguments you supply in the
LIST
. It is not passed the
VARIABLE
, however. The only way the constructor can tell which kind of
VARIABLE
is being tied is by knowing its own method name. This is not the customary constructor name,
new
, but rather one of
TIESCALAR
,
TIEARRAY
, or
TIEHASH
. (You can likely figure out which name goes with which variable type.) The constructor just returns an object reference in the normal fashion, and doesn't worry about whether it was called from
tie
- which it may not have been, since you can call these methods directly if you like. (Indeed, if you've tied your variable to a class that provides other methods not accessible through the variable, you
must
call the other methods directly yourself, via the object reference. These extra methods might provide services like file locking or other forms of transaction protection.)
As in any constructor, these constructors must bless a reference to a thingy and return it as the implementation object. The thingy inside the implementation object doesn't have to be of the same type as the variable you're tying to. It does have to be a properly blessed object, though. See the example below on tied arrays, which uses a hash object to hold information about an array.
The tie function will not use or require a module for you - you must do that explicitly yourself. (On the other hand, the dbmopen emulator function will, for backward compatibility, attempt to use one or another DBM implementation. But you can preempt its selection with an explicit use , provided the module you use is one of the modules in dbmopen 's list of modules to try. See the AnyDBM_File module in Chapter 7 for a fuller explanation.)
A class implementing a tied scalar must define the following methods:
TIESCALAR
,
FETCH
,
STORE
, and possibly
DESTROY
. These routines will be invoked implicitly when you
tie
a variable (
TIESCALAR
), read a tied variable (
FETCH
), or assign a value to a tied variable (
STORE
). The
DESTROY
method is called (as always) when the last reference to the object disappears. (This may or may not happen when you call
untie
, which destroys the reference used by the tie, but doesn't destroy any outstanding references you may have squirreled away elsewhere.) The
FETCH
and
STORE
methods are triggered when you access the variable that's been tied, not the object it's been tied to. If you have a handle on the object (either returned by the initial
tie
or retrieved later via
tied
), you can access the underlying object yourself without automatically triggering its
FETCH
or
STORE
methods.
Let's look at each of these methods in turn, using as our example an imaginary class called
Nice
.[
17
] Variables tied to this class are scalars containing process priorities, and each such variable is implicitly associated with an object that contains a particular process ID, such the ID of the currently running process or of the parent process. (Presumably you'd name your variables to remind you which process you're referring to.) Variables are tied to the class this way:
[17] UNIX priorities are associated with the word "nice" because they're inverted from what you'd expect. Higher priorities run slower, hence are "nicer" to other processes. A more portable module might prefer a less UNIX-centric name like
Priority
. But if we were writing this class for the Perl library, we'd probably call itTie::Priority
or some such, to fit the library's hierarchical naming scheme. Not everything can be a top-level class, or things will get rather confused. Not to mention people.
use Nice; # load the Nice.pm module tie $his_speed, 'Nice', getppid(); tie $my_speed, 'Nice', $$;
Once the variables have been tied, their previous contents are no longer accessible. The internally forged connection between the variable and the object takes precedence over ordinary variable semantics.
For example, let's say you copy a variable that's been tied:
$speed = $his_speed;
Instead of reading the value in the ordinary fashion from the
$his_speed
scalar variable, Perl implicitly calls the
FETCH
method on the associated underlying object. It's as though you'd written this:
$speed = (tied $his_speed)->FETCH():
Or if you'd captured the object returned by the tie , you could simply use that reference instead of the tied function, as in the following sample code.
$myobj = tie $my_speed, 'Nice', $$; $speed = $my_speed; # through the implicit interface $speed = $myobj->FETCH(); # same thing, explicitly
You can use
$myobj
to call methods other than the implicit ones, such as those provided by the
DB_File
class (see
Chapter 7
). However, one normally minds one's own business and leaves the underlying object alone, which is why you often see the return value from
tie
ignored. You can still get at it if you need it later.
That's the external view of it. For our implementation, we'll use the
BSD::Resource
class (found in CPAN, but not included with Perl) to access the
PRIO_PROCESS
,
PRIO_MIN
, and
PRIO_MAX
constants from your system. Here's the preamble of our class, which we will put into a file named
Nice.pm
:
package Nice; use Carp; # Propagates error messages nicely. use BSD::Resource; # Use these hooks into the OS. use strict; # Enforce some discipline on ourselves, use vars '$DEBUG'; # but exempt $DEBUG from discipline.
The Carp module provides methods
carp()
,
croak()
, and
confess()
, which we'll use in various spots below. As usual, see
Chapter 7
for more about Carp.
The
use strict
would ordinarily disallow the use of unqualified package variables like
$DEBUG
, but we then declared the global with
use vars
, so it's exempt. Otherwise we'd have to say
$Nice::DEBUG
everywhere. But it is a global, and other modules can turn on debugging in our module by setting
$Nice::DEBUG
to some other value before using our module.
TIESCALAR
CLASSNAME
,
LIST
The
TIESCALAR
method of the class (that is, the class package, but we're going to stop reminding you of that) is implicitly invoked whenever
tie
is called on a scalar variable. The
LIST
contains any optional parameters needed to properly initialize an object of the given class. (In our example, there is only one parameter, the process ID.) The method is expected to return an object, which may or may not contain an anonymous scalar as its blessed thingy. In our example, it does.
sub TIESCALAR { my $class = shift; my $pid = shift; $pid ||= $$; # arg of 0 defaults to my process if ($pid =~ /\D/) { carp "Nice::TIESCALAR got non-numeric pid $pid" if $^W; return undef; } unless (kill 0, $pid) { # EPERM or ERSCH, no doubt carp "Nice::TIESCALAR got bad pid $pid: $!" if $^W; return undef; } return bless \$pid, $class; }
Recall that the statement with the
||=
operator is just shorthand for
$pid = $pid || $$; # set if not set
We say the object contains an anonymous scalar, but it doesn't really become anonymous until
my $pid
goes out of scope, since that's the variable we're generating a reference to when we bestow the blessing. When returning a reference to an array or hash, one could use the same approach by employing a lexically scoped array or hash variable, but usually people just use the anonymous array or hash composers,
[]
and
{}
. There is no similar composer for anonymous scalars.
On the subject of subterfuge, the kill isn't really killing the process. On most UNIX systems, a signal 0 merely checks to see whether the process is there.
This particular tie class has chosen to return an error value rather than raise an exception if its constructor fails. Other classes may not wish to be so forgiving. (In any event, the tie itself will throw an exception when the constructor fails to return an object. But you get more error messages this way, which many folks seem to prefer.) This routine checks the global variable $^W (which reflects Perl's -w flag) to see whether to emit its extra bit of noise.
But for all that, it's an ordinary constructor, and doesn't know it's being called from tie . It just suspects it strongly.
FETCH
THIS
This method is triggered every time the tied variable is accessed (that is, read). It takes no arguments beyond a reference to the object that is tied to the variable. (The
FETCH
methods for arrays and hashes do, though.) Since in this case we're just using a scalar thingy as the tied object, a simple scalar dereference,
$$self
, allows the method to get at the real value stored in its object. In the example below, that real value is the process ID to which we've tied our variable.
sub FETCH { my $self = shift; # ref to scalar confess "wrong type" unless ref $self; croak "too many arguments" if @_; my $nicety; local $! = 0; # preserve errno $nicety = getpriority(PRIO_PROCESS, $$self); if ($!) { croak "getpriority failed: $!" } return $nicety; }
This time we've decided to blow up (raise an exception) if the getpriority function fails - there's no place for us to return an error otherwise, and it's probably the right thing to do.
Note the absence of a
$
on
PRIO_PROCESS
. That's really a subroutine call into
BSD::Resource
that returns the appropriate constant to feed back into
getpriority
. The
PRIO_PROCESS
declaration was imported by the
use
declaration. And that's why there's no
$
on the front of it - it's not a variable. (If you had put a
$
, the
use strict
would have caught it for you as an unqualified global.)
STORE
THIS, VALUE
This method is triggered every time the tied variable is set (assigned). The first argument,
THIS
, is again a reference to the object associated with the variable, and
VALUE
is the value the user is assigning to the variable.
sub STORE { my $self = shift; my $new_nicety = shift; confess "wrong type" unless ref $self; croak "too many arguments" if @_; if ($new_nicety < PRIO_MIN) { carp sprintf "WARNING: priority %d less than minimum system priority %d", $new_nicety, PRIO_MIN if $^W; $new_nicety = PRIO_MIN; } if ($new_nicety > PRIO_MAX) { carp sprintf "WARNING: priority %d greater than maximum system priority %d", $new_nicety, PRIO_MAX if $^W; $new_nicety = PRIO_MAX; } unless (defined setpriority(PRIO_PROCESS, $$self, $new_nicety)) { confess "setpriority failed: $!"; } return $new_nicety; }
There doesn't appear to be anything worth explaining there, except maybe that we return the new value because that's what an assignment returns.
DESTROY
THIS
This method is triggered when the object associated with the tied variable needs to be destructed (usually only when it goes out of scope). As with other object classes, such a method is seldom necessary, since Perl deallocates the moribund object's memory for you automatically. Here, we'll use a
DESTROY
method for debugging purposes only.
sub DESTROY { my $self = shift; confess "wrong type" unless ref $self; carp "[ Nice::DESTROY pid $$self ]" if $DEBUG; }
That's about all there is to it. Actually, it's more than all there is to it, since we've done a few nice things here for the sake of completeness, robustness, and general aesthetics (or lack thereof). Simpler
TIESCALAR
classes are certainly possible.
A class implementing a tied ordinary array must define the following methods:
TIEARRAY
,
FETCH
,
STORE
, and perhaps
DESTROY
.
Tied arrays are incomplete. There are, as yet, no defined methods to deal with
$#ARRAY
access (which is hard, since it's an lvalue), nor with the other obvious array functions, like
push
,
pop
,
shift
,
unshift
, and
splice
. This means that a tied array doesn't behave like an untied one. You can't even determine the length of the array. But if you use the tied arrays only for simple read and write access you'll be OK. These restrictions will be removed in a future release.
For the purpose of this discussion, we will implement an array whose indices are fixed at its creation. If you try to access anything beyond those bounds, you will cause an exception.
require Bounded_Array; tie @ary, 'Bounded_Array', 2; # maximum allowable subscript is 2 $| = 1; for $i (0 .. 10) { print "setting index $i: "; $ary[$i] = 10 * $i; # should raise exception on 3 print "value of element $i now $ary[$i]\n"; }
The preamble code for the class is as follows:
package Bounded_Array; use Carp; use strict;
TIEARRAY
CLASSNAME
,
LIST
This is the constructor for the class. That means it is expected to return a blessed reference through which the new array (probably an anonymous array reference) will be accessed.
In our example, just to demonstrate that you don't really have to use an array thingy, we'll choose a hash thingy to represent our object. A hash works out well as a generic record type: the
{BOUND}
field will store the maximum bound allowed, and the
{ARRAY}
field will hold the true array reference. Anyone outside the class who tries to dereference the object returned (doubtless thinking it an array reference), will blow up. This just goes to show that you should respect an object's privacy (unless you're well acquainted and committed to maintaining a good relationship for the rest of your life).
sub TIEARRAY { my $class = shift; my $bound = shift; confess "usage: tie(\@ary, 'Bounded_Array', max_subscript)" if @_ or $bound =~ /\D/; return bless { BOUND => $bound, ARRAY => [], }, $class; }
In this case we have used the anonymous hash composer rather than a lexically scoped variable that goes out of scope. We also used the array composer within the hash composer.
FETCH
THIS, INDEX
This method will be triggered every time an individual element in the tied array is accessed (read). It takes one argument beyond its self reference: the index we're trying to fetch. (The index is an integer, but just because the caller thinks of it as a mundane integer doesn't mean you have to do anything "linear" with it. You could use it to seed a random number generator, for instance, or process it with a hash function to do a random lookup in a hash table.)
Here we use list assignment rather than shift to process the method arguments. TMTOWTDI.
sub FETCH { my ($self, $idx) = @_; if ($idx > $self->{BOUND}) { confess "Array OOB: $idx > $self->{BOUND}"; } return $self->{ARRAY}[$idx]; }
As you may have noticed, the names of the
FETCH
,
STORE
, and
DESTROY
methods are the same for all tied classes, even though the constructors differ in name (
TIESCALAR
versus
TIEARRAY
). While in theory you could have the same class servicing several tied types, in practice this becomes cumbersome, and it's easiest to simply write them with one type per class.
STORE
THIS, INDEX, VALUE
This method will be triggered every time an element in the tied array is set (written). It takes two arguments beyond its self reference: the index at which we're trying to store something and the value we're trying to put there. For example:
sub STORE { my ($self, $idx, $value) = @_; if ($idx > $self->{BOUND} ) { confess "Array OOB: $idx > $self->{BOUND}"; } return $self->{ARRAY}[$idx] = $value; }
DESTROY
THIS
This method will be triggered when the tied object needs to be deallocated. As with the scalar tie class, this is almost never needed in a language that does its own storage allocation, so this time we'll just leave it out.
The code we presented at the beginning of this section attempts several out-of-bounds accesses. It will therefore generate the following output:
setting index 0: value of element 0 now 0 setting index 1: value of element 1 now 10 setting index 2: value of element 2 now 20 setting index 3: Array OOB: 3 > 2 at Bounded_Array.pm line 39 Bounded_Array::FETCH called at testba line 12
For historical reasons, hashes have the most complete and useful
tie
implementation. A class implementing a tied associative array must define various methods.
TIEHASH
is the constructor.
FETCH
and
STORE
access the key/value pairs.
EXISTS
reports whether a key is present in the hash, and
DELETE
deletes one.
CLEAR
empties the hash by deleting all the key/value pairs.
FIRSTKEY
and
NEXTKEY
implement the
keys
and
each
built-in functions to iterate over all the keys. And
DESTROY
(if defined) is called when the tied object is deallocated.
If this seems like a lot, then feel free to inherit most of these methods from the standard Tie::Hash module, redefining only the interesting ones. See the Tie::Hash module documentation in Chapter 7 for details.
Remember that Perl distinguishes a key not existing in the hash from a key that exists with an undefined value. The two possibilities can be tested with the exists and defined functions, respectively.
Because functions like keys and values may return huge array values when used on large hashes (like tied DBM files), you may prefer to use the each function to iterate over such. For example:
# print out B-news history file offsets use NDBM_File; tie(%HIST, 'NDBM_File', '/usr/lib/news/history', 1, 0); while (($key,$val) = each %HIST) { print $key, ' = ', unpack('L',$val), "\n"; } untie(%HIST);
(But does anyone run B-news any more?)
Here's an example of a somewhat peculiar tied hash class: it gives you a hash representing a particular user's dotfiles (that is, files whose names begin with a period). You index into the hash with the name of the file (minus the period) and you get back that dotfile's contents. For example:
use DotFiles; tie %dot, "DotFiles"; if ( $dot{profile} =~ /MANPATH/ or $dot{login} =~ /MANPATH/ or $dot{cshrc} =~ /MANPATH/ ) { print "you've set your manpath\n"; }
Here's another way to use our tied class:
# third argument is name of user whose dot files we will tie to tie %him, 'DotFiles', 'daemon'; foreach $f ( keys %him ) { printf "daemon dot file %s is size %d\n", $f, length $him{$f}; }
In our DotFiles example we implement the object as a regular hash containing several important fields, of which only the
{CONTENTS}
field will be what the user thinks of as the real hash. Here are the fields:
USER
Whose dot files this object represents
HOME
Where those dotfiles live
CLOBBER
Whether we are allowed to change or remove those dot files
CONTENTS
The hash of dotfile names and content mappings
Here's the start of DotFiles.pm :
package DotFiles; use Carp; sub whowasi { (caller(1))[3] . '()' } my $DEBUG = 0; sub debug { $DEBUG = @_ ? shift : 1 }
For our example, we want to be able to emit debugging information to help in tracing during development. We also keep one convenience function around internally to help print out warnings;
whowasi()
returns the name of the function that called the current function (
whowasi()
's "grandparent" function).
Here are the methods for the DotFiles tied hash.
TIEHASH
CLASSNAME
,
LIST
This is the constructor for the class. That means it is expected to return a blessed reference through which the new object may be accessed. Again, the user of the tied class probably has little need of the object. It's Perl itself that needs the returned object so that it can magically call the right methods when the tied variable is accessed.
Here's the constructor:
sub TIEHASH { my $self = shift; my $user = shift || $>; my $dotdir = shift || ""; croak "usage: @{[&whowasi]} [USER [DOTDIR]]" if @_; $user = getpwuid($user) if $user =~ /^\d+$/; my $dir = (getpwnam($user))[7] or croak "@{[&whowasi]}: no user $user"; $dir .= "/$dotdir" if $dotdir; my $node = { USER => $user, HOME => $dir, CONTENTS => {}, CLOBBER => 0, }; opendir DIR, $dir or croak "@{[&whowasi]}: can't opendir $dir: $!"; foreach $dot ( grep /^\./ && -f "$dir/$_", readdir(DIR)) { $dot =~ s/^\.//; $node->{CONTENTS}{$dot} = undef; } closedir DIR; return bless $node, $self; }
It's probably worth mentioning that if you're going to filetest the return values returned by that readdir , you'd better prepend the directory in question (as we do). Otherwise, since no chdir was done, you'd test the wrong file.
FETCH
THIS, KEY
This method will be triggered every time an element in the tied hash is accessed (read). It takes one argument beyond its self reference: the key whose value we're trying to fetch. The key is a string, and you can do anything you like with it (consistent with its being a string).
Here's the fetch for our DotFiles example.
sub FETCH { carp &whowasi if $DEBUG; my $self = shift; my $dot = shift; my $dir = $self->{HOME}; my $file = "$dir/.$dot"; unless (exists $self->{CONTENTS}->{$dot} || -f $file) { carp "@{[&whowasi]}: no $dot file" if $DEBUG; return undef; } # Implement a cache. if (defined $self->{CONTENTS}->{$dot}) { return $self->{CONTENTS}->{$dot}; } else { return $self->{CONTENTS}->{$dot} = `cat $dir/.$dot`; } }
This function was easy to write by having it call the UNIX cat (1) command, but it would be more portable (and somewhat more efficient) to open the file ourselves. On the other hand, since dot files are a UNIXy concept, we're not that concerned.
STORE
THIS, KEY, VALUE
This method will be triggered every time an element in the tied hash is set (written). It takes two arguments beyond its self reference: the key under which we're storing the value and the value we're putting there.
Here in our DotFiles example we won't let users overwrite a file without first calling the
clobber()
method on the original object reference returned by
tie
.
sub STORE { carp &whowasi if $DEBUG; my $self = shift; my $dot = shift; my $value = shift; my $file = $self->{HOME} . "/.$dot"; croak "@{[&whowasi]}: $file not clobberable" unless $self->{CLOBBER}; open(F, "> $file") or croak "can't open $file: $!"; print F $value; close(F); }
If they want to clobber something, they can say:
$ob = tie %daemon_dots, 'daemon'; $ob->clobber(1); $daemon_dots{signature} = "A true daemon\n";
But there's also the
tied
function, so they could alternatively set
clobber
using:
tie %daemon_dots, 'daemon'; tied(%daemon_dots)->clobber(1);
The
clobber
method is simply:
sub clobber { my $self = shift; $self->{CLOBBER} = @_ ? shift : 1; }
DELETE
THIS, KEY
This method is triggered when we remove an element from the hash, typically by using the delete function. Again, we'll be careful to check whether the user really wants to clobber files.
sub DELETE { carp &whowasi if $DEBUG; my $self = shift; my $dot = shift; my $file = $self->{HOME} . "/.$dot"; croak "@{[&whowasi]}: won't remove file $file" unless $self->{CLOBBER}; delete $self->{CONTENTS}->{$dot}; unlink $file or carp "@{[&whowasi]}: can't unlink $file: $!"; }
CLEAR
THIS
This method is triggered when the whole hash is to be cleared, usually by assigning the empty list to it.
In our example, that would remove all the user's dotfiles! It's such a dangerous thing that we'll require
CLOBBER
to be set higher than
1
before this can happen.
sub CLEAR { carp &whowasi if $DEBUG; my $self = shift; croak "@{[&whowasi]}: won't remove all dotfiles for $self->{USER}" unless $self->{CLOBBER} > 1; my $dot; foreach $dot ( keys %{$self->{CONTENTS}}) { $self->DELETE($dot); } }
EXISTS
THIS, KEY
This method is triggered when the user invokes the
exists
function on a particular hash. In our example, we'll look at the
{CONTENTS}
hash element to find the answer:
sub EXISTS { carp &whowasi if $DEBUG; my $self = shift; my $dot = shift; return exists $self->{CONTENTS}->{$dot}; }
FIRSTKEY
THIS
This method is triggered when the user begins to iterate through the hash, such as with a keys or each call. By calling keys in a scalar context, we reset its internal state to ensure that the next each used in the return statement will get the first key.
sub FIRSTKEY { carp &whowasi if $DEBUG; my $self = shift; my $a = keys %{$self->{CONTENTS}}; return scalar each %{$self->{CONTENTS}}; }
NEXTKEY
THIS, LASTKEY
This method is triggered during a
keys
or
each
iteration. It has a second argument which is the last key that has been accessed. This is useful if the
NEXTKEY
method needs to know its previous state to calculate the next state.
For our example, we are using a real hash to represent the tied hash's data, except that this hash is stored in the hash's
CONTENTS
field instead of in the hash itself. So we can just rely on Perl's
each
iterator:
sub NEXTKEY { carp &whowasi if $DEBUG; my $self = shift; return scalar each %{ $self->{CONTENTS} } }
DESTROY
THIS
This method is triggered when a tied hash's object is about to be deallocated. You don't really need it except for debugging and extra cleanup. Here's a very simple function:
sub DESTROY { carp &whowasi if $DEBUG; }