Contents:
Introduction
Defining a Module's Interface
Trapping Errors in require or use
Delaying use Until Run Time
Making Variables Private to a Module
Determining the Caller's Package
Automating Module Clean-Up
Keeping Your Own Module Directory
Preparing a Module for Distribution
Speeding Module Loading with SelfLoader
Speeding Up Module Loading with Autoloader
Overriding Built-In Functions
Reporting Errors and Warnings Like Built-Ins
Referring to Packages Indirectly
Using h2ph to Translate C #include Files
Using h2xs to Make a Module with C Code
Documenting Your Module with Pod
Building and Installing a CPAN Module
Example: Module Template
Program: Finding Versions and Descriptions of Installed Modules
Like all those possessing a library, Aurelian was aware that he was guilty of not knowing his in its entirety.
- Jorge Luis Borges The Theologians
Imagine that you have two separate programs, both of which work fine by themselves, and you decide to make a third program that combines the best features from the first two. You copy both programs into a new file or cut and paste selected pieces. You find that the two programs had variables and functions with the same names that should remain separate. For example, both might have an
init
function or a global
$count
variable. When merged into one program, these separate parts would interfere with each other.
The solution to this problem is
packages
. Perl uses packages to partition the global namespace. The package is the basis for both traditional modules and object-oriented classes. Just as directories contain files, packages contain identifiers. Every global identifier (variables, functions, file and directory handles, and formats) has two parts: its package name and the identifier proper. These two pieces are separated from one another with a double colon. For example, the variable
$CGI::needs_binmode
is a global variable named
$needs_binmode
, which resides in package
CGI
.
Where the filesystem uses slashes to separate the directory from the filename, Perl uses a double colon (prior to release 5.000, you could only use a single quote mark, as in
$CGI'needs_bin_mode
).
$Names::startup
is the variable named
$startup
in the package
Names
, whereas
$Dates::startup
is the
$startup
variable in package
Dates
. Saying
$startup
by itself without a package name means the global variable
$startup
in the current package. (This assumes that no lexical
$startup
variable is currently visible. Lexical variables are explained in
Chapter 10,
Subroutines
.) When looking at an unqualified variable name, a lexical takes precedence over a global. Lexicals live in scopes; globals live in packages. If you really want the global instead, you need to fully qualify it.
package
is a compile-time declaration that sets the default package prefix for unqualified global identifiers, just as
chdir
sets the default directory prefix for relative pathnames. This effect lasts until the end of the current scope (a brace-enclosed block, file, or
eval
). The effect is also terminated by any subsequent package statement in the same scope. (See the following code.) All programs are in package
main
until they use a
package
statement to change this.
package Alpha; $name = "first"; package Omega; $name = "last"; package main; print "Alpha is $Alpha::name, Omega is $Omega::name.\n"; Alpha is first, Omega is last.
Unlike user-defined identifiers, built-in variables with punctuation names (like
$_
and
$.
) and the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC, and SIG are all forced to be in package
main
when unqualified. That way things like STDIN,
@ARGV
,
%ENV
, and
$_
are always the same no matter what package you're in; for example,
@ARGV
always means
@main::ARGV
, even if you've used
package
to change the default package. A fully qualified
@ElseWhere::ARGV
would not (and carries no special built-in meaning). Make sure to localize
$_
if you use it in your module.
The unit of software reuse in Perl is the
module
, a file that has a collection of related functions designed to be used by other programs and library modules. Every module has a public interface, a set of variables and functions that outsiders are encouraged to use. From inside the module, the interface is defined by initializing certain package variables that the standard Exporter module looks at. From outside the module, the interface is accessed by importing symbols as a side effect of the
use
statement. The public interface of a Perl module is whatever is documented to be public. In the case of undocumented interfaces, it's whatever is vaguely intended to be public. When we talk about modules in this chapter, and traditional modules in general, we mean those that use the Exporter.
The
require
or
use
statements both pull a module into your program, although their semantics are slightly different.
require
loads modules at runtime, with a check to avoid the redundant loading of a given module.
use
is like
require
, with two added properties: compile-time loading and automatic importing.
Modules included with
use
are processed at compile time, but
require
processing happens at run time. This is important because if a module that a program needs is missing, the program won't even start because the
use
fails during compilation of your script. Another advantage of compile-time
use
over run-time
require
is that function prototypes in the module's subroutines become visible to the compiler. This matters because only the compiler cares about prototypes, not the interpreter. (Then again, we don't usually recommend prototypes except for replacing built-in commands, which do have them.)
use
is suitable for giving hints to the compiler because of its compile-time behavior. A
pragma
is a special module that acts as directive to the compiler to alter how Perl compiles your code. A pragma's name is always all lowercase, so when writing a regular module instead of a pragma, choose a name that starts with a capital letter. Pragmas supported by Perl 5.004 include autouse, constant, diagnostics, integer, lib, locale, overload, sigtrap, strict, subs, and vars. Each has its own manpage.
The other difference between
require
and
use
is that
use
performs an implicit
import
on the included module's package. Importing a function or variable from one package to another is a form of aliasing; that is, it makes two different names for the same underlying thing. It's like linking in files from another directory to your current one by the command
ln /somedir/somefile.
Once it's linked in, you no longer have to use the full pathname to access the file. Likewise, an imported symbol no longer needs to be fully qualified by package name (or predeclared with
use
vars
or
use
subs
). You can use imported variables as though they were part of your package. If you imported
$English::OUTPUT_AUTOFLUSH
in the current package, you could refer to it as
$OUTPUT_AUTOFLUSH
.
The required file extension for a Perl module is
".pm"
. The module named FileHandle would be stored in the file
FileHandle.pm
. The full path to the file depends on your include path, which is stored in the global @INC variable.
Recipe 12.7
shows how to manipulate this array to your own purposes.
If the module name itself contains one or more double colons, these are translated into your system's directory separator. That means that the File::Find module resides in the file File/Find.pm under most filesystems. For example:
require "FileHandle.pm"; # run-time load require FileHandle; # ".pm" assumed; same as previous use FileHandle; # compile-time load require "Cards/Poker.pm"; # run-time load require Cards::Poker; # ".pm" assumed; same as previous use Cards::Poker; # compile-time load
The following is a typical setup for a hypothetical module named Cards::Poker that demonstrates how to manage its exports. The code goes in the file named Poker.pm within the directory Cards : that is, Cards/Poker.pm . (See Recipe 12.7 for where the Cards directory should reside.) Here's that file, with line numbers included for reference:
1 package Cards::Poker; 2 use Exporter; 3 @ISA = ('Exporter'); 4 @EXPORT = qw(&shuffle @card_deck); 5 @card_deck = (); # initialize package global 6 sub shuffle { } # fill-in definition later 7 1; # don't forget this
Line 1 declares the package that the module will put its global variables and functions in. Typically, a module first switches to a particular package so that it has its own place for global variables and functions, one that won't conflict with that of another program. This
must
be written exactly as the corresponding
use
statement will be written when the module is loaded.
Don't say
package
Poker
just because the basename of your file is
Poker.pm
. Rather, say
package
Cards::Poker
because your users will say
use
Cards::Poker
. This common problem is hard to debug. If you don't make the
package
and
use
statements exactly the same, you won't see a problem until you try to call imported functions or access imported variables, which will be mysteriously missing.
Line 2 loads in the
Exporter module, which manages your module's public interface as described below. Line 3 initializes the special, per-package array
@ISA
to contain the word
"Exporter"
. When a user says
use
Cards::Poker
, Perl implicitly calls a special method,
Cards::Poker->import()
. You don't have an
import
method in your package, but that's OK, because the Exporter package does, and you're
inheriting
from it because of the assignment to
@ISA
(
is a
). Perl looks at the package's
@ISA
for resolution of undefined methods. Inheritance is a topic of
Chapter 13,
Classes, Objects, and Ties
. You may ignore it for now - so long as you put code as shown in lines 2 and 3 into each module you write.
Line 4 assigns the list
('&shuffle',
'@card_deck')
to the special, per-package array
@EXPORT
. When someone imports this module, variables and functions listed in that array are aliased into the caller's own package. That way they don't have to call the function
Poker::Deck::shuffle(23)
after the import. They can just write
shuffle(23)
instead. This won't happen if they load Cards::Poker with
require
Cards::Poker
; only a
use
imports.
Lines 5 and 6 set up the package global variables and functions to be exported. (We presume you'll actually flesh out their initializations and definitions more than in these examples.) You're free to add other variables and functions to your module as well, including ones you don't put in the public interface via
@EXPORT
. See
Recipe 12.1
for more about using the Exporter.
Finally, line 7 is a simple
1
, indicating the overall return value of the module. If the last evaluated expression in the module doesn't produce a true value, an exception will be raised. Trapping this is the topic of
Recipe 12.2
. Any old true value will do, like 6.02e23 or
"Because
tchrist
and
gnat
told
us
to
put
this
here"
; however,
1
is the canonical true value used by almost every module.
Packages group and organize global identifiers. They have nothing to do with privacy. Code compiled in package
Church
can freely examine and alter variables in package
State
. Package variables are always global and are used for sharing. But that's okay, because a module is more than just a package; it's also a file, and files count as their own scope. So if you want privacy, use lexical variables instead of globals. This is the topic of
Recipe 12.4
.
A library is a collection of loosely related functions designed to be used by other programs. It lacks the rigorous semantics of a Perl module. The file extension
.pl
indicates that it's a Perl library file. Examples include
syslog.pl
and
chat2.pl
.
Perl libraries - or in fact, any arbitrary file with Perl code in it - can be loaded in using
do
'file.pl'
or with
require
'file.pl'
. The latter is preferred in most situations, because unlike
do
,
require
does implicit error checking. It raises an exception if the file can't be found in your
@INC
path, doesn't compile, or if it doesn't return a true value when any initialization code is run. (The last part is what the
1;
was for earlier.) Another advantage of
require
is that it keeps track of which files have already been loaded in the global hash
%INC
. It doesn't reload the file if
%INC
indicates that the file has already been read in.
Libraries work well when used by a program, but problems can arise when libraries use one another. Consequently, simple Perl libraries have been rendered mostly obsolete, replaced by the more modern modules. But some programs still use libraries, usually loading them in with
require
instead of
do
.
Other file extensions are occasionally seen in Perl. A
".ph"
is used for C header files that have been translated into Perl libraries using the
h2ph
tool, as discussed in
Recipe 12.14
. A
".xs"
indicates an augmented C source file, possibly created by the
h2xs
tool, which will be compiled by the
xsubpp
tool and your C compiler into native machine code. This process of creating mixed-language modules is discussed in
Recipe 12.15
.
So far we've only talked about traditional modules, which export their interface by allowing the caller direct access to particular subroutines and variables. Most modules fall into this category. But some problems - and some programmers - lend themselves to more intricately designed modules, those involving objects. An object-oriented module seldom uses the import-export mechanism at all. Instead, it provides an object-oriented interface full of constructors, destructors, methods, inheritance, and operator overloading. This is the subject of Chapter 13 .
CPAN, the Comprehensive Perl Archive Network, is a gigantic repository of nearly everything about Perl you could imagine, including source, documentation, alternate ports, and above all, modules. Before you write a new module, check with CPAN to see whether one already exists that does what you need. Even if one doesn't, something close enough might give you ideas.
You can access CPAN at http://www.perl.com/CPAN/CPAN.html (or ftp://www.perl.com/pub/perl/CPAN/CPAN.html ). This file briefly describes each of CPAN's modules, but because it's manually edited, it may not always have the very latest modules' descriptions. You can find out about those in the CPAN/RECENT or CPAN/RECENT.html file.
The module directory itself is in CPAN/modules . It contains indices of all registered modules plus three convenient subdirectories: by-module , by-author , and by-category . All modules are available through each of these, but the by-category directory is probably the most useful. There you will find directories covering specific applications areas including operating system interfaces; networking, modems, and interprocess communication; database interfaces; user interfaces; interfaces to other programming languages; authentication, security, and encryption; World Wide Web, HTML, HTTP, CGI, and MIME; images, pixmap and bitmap manipulation, drawing, and graphing - just to name a few.
The sections on "Packages" and on "Modules" in Chapter 5 of Programming Perl and in perlmod (1)