Let us define a few preliminary terms before we start implementing objects in Perl.
An object (also called an instance ), like a given car, has the following:
Objects of a certain type are said to belong to a class . My car and your car belong to the class called Car or, if you are not too worried about specific details, to a class called Vehicle. All objects of a class have the same functionality.
In this section, we study how to create objects and how to enrich basic designs using inheritance and polymorphism.
An object is a collection of attributes. An array or a hash can be used to represents this set, as we discussed in Chapter 2, Implementing Complex Data Structures . For example, if you need to keep track of an employee's particulars, you might choose one of these approaches:
# Use a hash table to store Employee attributes %employee = ("name" => "John Doe", "age" => 32, "position" => "Software Engineer"); print "Name: ", $employee{name}; # Or use an array $name_field = 0; $age_field = 1; $position_field = 2; @employee = ("John Doe", 32, "Software Engineer"); print "Name: ", $employee[$name_field];
The section "Efficient Attribute Storage" in Chapter 8, Object Orientation: The Next Few Steps describes a more efficient approach for storing attributes. Meanwhile, we will use a hash table for all our examples.
Clearly, one %employee won't suffice. Each employee requires a unique identity and his or her own collection of attributes. You can either allocate this structure dynamically or return a reference to a local data structure, as shown below:
# Using an anonymous hash sub new_employee { my ($name, $age, $starting_position) = @_; my $r_employee = { # Create a unique object "name" => $name, # using an anonymous hash "age" => $age, "position" => $starting_position }; return $r_employee; # Return "object" } # OR, returning a reference to a local variable sub new_employee { my ($name, $age, $starting_position) = @_; my %employee = ( "name" => $name, "age" => $age, "position" => $starting_position ); return \%employee; # return a reference to a local object } # Use it to create two employees $emp1 = new_employee("John Doe", 32, "Software Engineer"); $emp2 = new_employee("Norma Jean", 25, "Vice President");
new_employee() returns a reference to a unique data structure in both cases.
As a user of this subroutine, you are not expected to know whether this scalar contains a reference to a Perl data structure or whether it contains a string (for example, it could just contain a database primary key, while the rest of the details are in a corporate database). The employee details are hence well encapsulated. Not that encapsulation should not be confused with enforced privacy.
In the preceding example, the hash table is the object, and the reference to the hash table is termed the object reference . Keep in mind that we have not introduced any new syntax since the last chapter.
All functions that access or update one or more attributes of the object constitute the behavior of the object.
Consider
sub promote_employee { my $r_employee = shift; $r_employee->{"position"} = lookup_next_position($r_employee->{"position"}); } # To use it promote_employee($emp1);
Such functions are also called instance methods in OO circles because they require a specific instance of the an object; an employee, in this case.
To avoid having to suffix every method with the suffix "_employee," we put all these functions in a package of their own, called Employee:
package Employee; sub new { # No need for the suffix. .... } sub promote { .... }
To use this module, you need to say:
$emp = Employee::new("John Doe", 32, "Software Engineer"); Employee::promote($emp);
As you can see, this code is beginning to encapsulate a class called Employee : the user of this code invokes only the interface functions new and promote and does not know or care about the type of data structure used to store employee details, or, as we mentioned earlier, whether a database is being used behind the scenes.
What we have seen thus far is the kind of stuff that a C programmer would do, except that he or she would likely use a struct to keep track of the attributes. This is precisely the way the stdio library works, for example. fopen() is a constructor that returns a pointer to a unique FILE structure, allocated dynamically. The pointer (the object reference) is supplied to other methods like fgets() and fprintf() .
Unfortunately, complications arise when the problem gets more involved. Let us say we have to keep information about hourly and regular employees. Hourly employees get paid by the hour and are eligible for overtime pay, while regular employees get a monthly salary. One way to approach it is to create a new function per type of employee:
package Employee; # Creating Regular Employees sub new_regular { my ($name, $age, $starting_position, $monthly_salary) = @_; my $employee = { "name" => $name, "age" => $age, "position" => $starting_position, "monthly_salary" => $monthly_salary, }; return $employee; # return the object reference } # Hourly Employees sub new_hourly { my ($name, $age, $starting_position, $hourly_rate, $overtime_rate) = @_; my $employee = { "name" => $name, "age" => $age, "position" => $starting_position, "hourly_rate" => $hourly_rate, "overtime_rate" => $overtime_rate }; return $employee; # return the object reference }
Now, if we want to get an employee's year-to-date salary, we have to make a distinction between the two types of employees. We could provide the two subroutines compute_hourly_ytd_income() and compute_regular_ytd_income() , but of course the story doesn't end there. Other differences between hourly and regular employees (such as allowed vacation, medical benefits, and so on) or the introduction of other types of employees (such as temporary employees) results in a combinatorial explosion of functions. Worse, the interface requires the user of this package to make a distinction between types of employees to be able to call the right function.
To get us out of this bind, we put different types of employees in different packages. Then we use the bless keyword to tag objects internally with a pointer to the packages they belong to. The boldface lines in the following example show the changes from the code presented above (explanations follow):
#------------------------------------------------------------- package RegularEmployee ; sub new { my ($name, $age, $starting_position, $monthly_salary) = @_; my $r_employee = { "name" => $name, "age" => $age, "position" => $starting_position, "monthly_salary" => $monthly_salary, "months_worked" => 0, }; bless $r_employee, 'RegularEmployee'; # Tag object with pkg name return $r_employee; # Return object } sub promote { #... } sub compute_ytd_income { my $r_emp = shift; # Assume the months_worked attribute got modified at some point return $r_emp->{'monthly_salary'} * $r_emp->{'months_worked'}; } #------------------------------------------------------------- package HourlyEmployee ; sub new { my ($name, $age, $starting_position, $hourly_rate, $overtime_rate) = @_; my $r_employee = { "name" => $name, "age" => $age, "position" => $starting_position, "hourly_rate" => $hourly_rate, "overtime_rate" => $overtime_rate }; bless $r_employee, 'HourlyEmployee'; return $r_employee; } sub promote { #... } sub compute_ytd_income { my ($r_emp) = $_[0]; return $r_emp->{'hourly_rate'} * $r_emp->{'hours_worked'} + $r_emp->{'overtime_rate'} * $r_emp->{'overtime_hours_worked'}; }
bless is given an ordinary reference to a data structure. It tags that data structure (note: not the reference [ 1 ]) as belonging to a specific package and thus bestows on it some more powers, as we shall soon see. bless is to our hash table what baptism is to a child. It doesn't change the data structure in any way (which still remains a hash table), just as baptism doesn't really alter a person except to give them an additional identity.
[1] The reference is like a
void *
in C. The object is typed, not the C pointer or Perl reference.
The nice thing about bless is that it gives us a direct way of using this object. Here's how:
# First create two objects as before. $emp1 = RegularEmployee::new('John Doe', 32, # Polymorphism 'Software Engineer', 5000); $emp2 = HourlyEmployee::new('Jane Smith', 35, # Polymorphism 'Auditor', 65, 90);
Now use the arrow notation to directly invoke instance methods, or, as they say in OO-land, invoke methods on the object :
# Direct invocation $emp1->promote(); $emp2->compute_ytd_income();
When Perl sees
$emp1->promote()
, it determines the class to which
$emp1
belongs (the one under which it has been blessed). In this case, it is the
Regular-Employee
. Perl then calls this function as follows:
RegularEmployee::promote($emp1)
. In other words, the object on the left side of the arrow is simply given as the first parameter of the appropriate subroutine.
Both the :: and -> notations are in fact permissible, unlike in C++. The first one is more flexible because Perl figures out the class at run time, while the latter is faster because the function to be called is known at compile time. There is nothing magical about an instance method in Perl. It is an ordinary subroutine whose first parameter simply happens to be an object reference. (You might have noticed that the promote method did not change from the previous section.)
So is this mere syntactic sugar? Finally, all we seem to have achieved is the ability to call an instance method of an object through an alternate notation.
No, we have gained an important advantage. The module user doesn't have to discriminate between types of objects using an if statement but instead lets Perl take care of routing a call to the appropriate function. That is, instead of saying something like
if (ref($emp) eq "HourlyEmployee") { $income = HourlyEmployee::compute_ytd_income($emp); } else { $income = RegularEmployee::compute_ytd_income($emp); }
we can simply say,
$income = $emp->compute_ytd_income();
This ability of Perl to call the appropriate module's function is called run-time binding . Incidentally, recall from Chapter 1, Data References and Anonymous Storage , that the ref function returns a string indicating the type of the entity pointed to by the reference; in the case of a blessed object reference, it returns the name of the corresponding class.
Note that while processing payroll records, $emp can be a regular employee in one iteration and an hourly employee in another. This feature is called polymorphism (poly + morph = the ability of an object to take on many forms).
Polymorphism and run-time binding are the chief contributions of object-oriented languages. They give a system an enormous amount of flexibility because you can now add a new type of employee (with the same interface as the other types) without having to change the payroll-processing code. This is possible because each object "knows" how to compute its own year-to-date income. It pays to remember this cardinal rule:
It is indicative of inflexible procedural design if you find yourself using conditional statements to distinguish between object types.
The design is flexible also because you can add new methods to any of the packages without hurting what is already present.
Class attributes are properties that pertain to all instances of a class, but don't vary on a per-employee basis. For example, one insurance company might provide health coverage for all employees, so it doesn't make sense to store the name of this company in each and every employee.
Class methods (also known as static methods ) are functions that are relevant to that class but don't need a specific object instance to work with. For example, a subroutine called get_employee_names() doesn't require an employee object as input to figure out what it has to do.
Perl has no specific syntax for class attributes and methods, unlike C++ or Java. Class attributes are simply package global variables, and class methods are ordinary subroutines that don't work on any specific instance. Perl supports polymorphism and run-time binding for these ordinary subroutines (not just instance methods), which can be leveraged to produce a truly flexible design. Consider
$record = <STDIN>; # Tab delimited record containing employee details ($type, $name, $age, $position) = split(/\t/, $details); # Create an employee object of the appropriate class $emp = $type-> new($name, $age, $position); # Now use the object as before $emp->compute_ytd_income();
In this example, $type can contain either of these two strings: "HourlyEmployee" or "RegularEmployee." Note that this variable is not an object; it is simply the name of a class. This approach improves on the example in the previous section by avoiding having to hardcode the name of the package. Why is that an improvement? Well, if you didn't have this facility, you would have had to say something like this to create an appropriately typed object:
if ($type eq "HourlyEmployee") { $emp = HourlyEmployee->new(....); } else { $emp = RegularEmployee->new(....); }
Any piece of code that explicitly depends upon checking the class or type of an object requires too much maintenance. If you introduce a new type of employee tomorrow, you'll have to go back and add the new type to all such pieces of code.
Recall that in the case of an instance method, the object to the left of the arrow is passed as the first parameter to the subroutine. It is no different here. The procedure HourlyEmployee::new must be rewritten to expect this:
package HourlyEmployee ; sub new { my ( $pkg, $name, $age, $starting_position, $hourly_rate, $overtime_rate) = @_;
Given that both instance and class methods are ordinary subroutines, you can always write a subroutine that can function as either, by checking the type of the first parameter supplied to it. Consider the following constructor, which creates a new object or a clone of an existing one, depending on how it is invoked:
package Employee; sub new { $arg = shift; if (ref($arg)) { # Called as $emp->new(): Clone the Employee given to it #.... } else { # Called as Employee->new(): Create a new employee #... } }
You can now use this method as follows:
# Using new() as a class method $emp1 = Employee->new("John Doe", 20, "Vice President"); # Using new() as an instance method to clone the employee details $emp2 = $emp1->new();
I'll leave it up to you to answer why you might want to clone an employee!
What have we learned in this section? If we write all our class methods to expect the name of the module as the first parameter, we make it possible for the module's user to employ run-time binding and polymorphism. We will follow this practice from now on.
You might be curious why a class method needs to be supplied the name of its own module. We'll answer this shortly when we deal with inheritance.
Perl wouldn't be Perl if there weren't a couple of alternatives to suit everyone's fancy. It supports an alternative to the arrow notation, called the indirect notation , in which the function name precedes the object or class name. An example should make this clear:
$emp = new Employee ("John Doe", 20, "Vice President");
C++ folks will identify with this notation. This approach can be used for objects too:
promote $emp "Chairman", 100000; # Give him a promotion and a raise
Notice that there is no comma between
$emp
and the first argument (
"Chairman"
). This is how Perl knows that you are calling a method using the indirect notation and not calling a subroutine in the current package. Perhaps you will identify more with the following example:
use FileHandle; $fh = new FileHandle("> foo.txt"); print $fh "foo bar\n";
print is a method on the FileHandle module.
While the indirect notation has the same effect as the arrow notation, it cannot be used in a chain of calls. The following is possible only with the arrow notation:
use FileHandle; $fh = FileHandle->new("> foo.txt")->autoflush(1); # Chain of calls
Perl allows a module to specify a list of other module names, in a special array called @ISA . When it does not find a particular class or instance method in a module, it looks to see if that module's @ISA has been initialized. If so, it checks to see if any of those modules support the missing function, picks the first one it can find, and passes control to it. This feature is called inheritance . Consider
package Man; @ISA = qw(Mammal Social_Animal);
This allows us to specify that Man is-a Mammal and is-a Social_Animal . All traits (read: methods) common to mammals are supported in the Mammal class and don't have to be implemented in Man too. Let us look at a more practical example.
In our attempts to distinguish between hourly and regular employees, we have gone to the other extreme and made them completely independent. Clearly, there are a number of common attributes (name, age, and position) and behavior ( promote , say) that they all share as employees. We can thus use inheritance to "pull out" the common aspects into a superclass (or base class ) called Employee :
#--------------------------------------------------------- package Employee ; #Base class #--------------------------------------------------------- sub allocate{ my ($pkg, $name, $age, $starting_position) = @_; my $r_employee = bless { "name" => $name, "age" => $age, "position" => $starting_position } , $pkg ; return $r_employee; } sub promote { my $r_employee = shift; my $current_position = $r_employee->{"position"}; my $next_position = lookup_next_position($current_position); $r_employee->{"position"} = $next_position; } #--------------------------------------------------------- package HourlyEmployee; #--------------------------------------------------------- @ISA = ("Employee"); # Inherits from Employee sub new { my ($pkg, $name, $age, $starting_position, $hourly_rate, $overtime_rate) = @_; # Let the Employee package create and bless the object my $r_employee = $pkg->allocate ($name, $age, $starting_position); # Add HourlyEmployee-specific attributes and we are done. $r_employee->{"hourly_rate"} = $hourly_rate; $r_employee->{"overtime_rate"} = $overtime_rate; return $r_employee; # return the object reference } sub compute_ytd_income { .... } # ... And similarly for package RegularEmployee
Whatever is common to all employees is implemented in the base class. Since both HourlyEmployee and RegularEmployee need a class method called new() to allocate a hash table, to bless it, and to insert common attributes into this table, we factor this functionality out into a inheritable subroutine called allocate in module Employee .
Notice how allocate avoids hardcoding the name of a class, thus ensuring maximum reusability. HourlyEmployee::new() calls $pkg->allocate , which means that the first parameter to allocate , $pkg , has the value HourlyEmployee. allocate uses this to bless the object directly into the inherited class. HourlyEmployee::new doesn't need to create the object anymore; it just has to insert its own specific attributes.
Nothing has changed from the user's point of view. You still say,
$emp = HourlyEmployee->new(....);
But we have now managed to eliminate redundant code in the modules and left them open for future enhancements.
Let us say we wanted to ensure that hourly employees should never rise above the level of a manager. The example shows how to override the base class's promote() method to do this check. Here's how:
package HourlyEmployee; sub promote { my $obj = shift; die "Hourly Employees cannot be promoted beyond 'Manager'" if ($obj->{position} eq 'Manager'); # call base class's promote $obj->Employee::promote(); #Specify the package explicitly }
This syntax tells Perl to start the search for promote() in the @ISA hierarchy, starting from Employee . A small problem here is that by hardcoding the name of a class ( Employee ), we make it difficult for us to change our mind about the inheritance hierarchy. To avoid this, Perl provides a pseudoclass called SUPER , like Smalltalk, so that you can say,
$obj->SUPER::promote();
This searches the @ISA hierarchy for the appropriate promote subroutine. Now, if we interpose another package between Employee and HourlyEmployee in the inheritance hierarchy, we just need to update HourlyEmployee 's @ISA array.
NOTE: We have now gradually eliminated the need for the :: notation to call a module's subroutines. A subroutine either is imported directly into your namespace, in which case you don't need to fully qualify its name, or is invoked by using the -> notation. You still need to use " :: " to access a foreign package's variables .
Perl automatically garbage collects a data structure when its reference count drops to zero. If a data structure has been blessed into a module, Perl allows that module to perform some clean-up before it destroys the object, by calling a special procedure in that module called DESTROY and passing it the reference to the object to be destroyed:
package Employee; sub DESTROY { my ($emp) = @_; print "Alas, ", $emp->{"name"}, " is now no longer with us \n"; }
This is similar to C++'s destructor or the
finalize()
method in Java in that Perl does the memory management automatically, but you get a chance to do something before the object is reclaimed. (Unlike Java's
finalize
, Perl's garbage collection is deterministic;
DESTROY
is called as soon as the object is not being referred to any more.)
Note that you are not compelled to declare this subroutine; you do so only if you have some clean-up work to be done. In a module such as Socket, you would close the corresponding connection, but in something like Employee, where no external system resources are being held up, you don't have to provide a
DESTROY
method. But recall that
AUTOLOAD
is called if a function is not found. In the case in which you supply
AUTOLOAD
but not the
DESTROY
method, you might want to ensure that
AUTOLOAD
checks for this possibility:
sub AUTOLOAD { my $obj = $_[0]; # $AUTOLOAD contains the name of the missing method # Never propagate DESTROY methods return if $AUTOLOAD =~ /::DESTROY$/; # .... }
According to Rumbaugh et al. [ 15 ]:
Encapsulation can be violated when code associated with one class directly accesses the attributes of another class. Direct access makes assumptions about storage format and location of the data. These details must be hidden within the class....The proper way to access an attribute of another object is to "ask for it" by invoking an operation on the object, rather than simply "taking it."
This is as true for classes related by inheritance as for unrelated classes.
To discourage direct access to an object's attributes, we provide "accessor methods." These two methods read and update the "position" attribute of an employee:
$pos = $emp->get_position(); # read attribute $emp->set_position("Software Engineer"); # write attribute
The more popular convention is to have one method to handle both read and write access:
$pos = $emp->position(); # read attribute $emp->position("Software Engineer"); # write attribute
This is how the module might implement it:
package Employee; sub position { my $obj = shift; @_ ? $obj->{position} = shift # modify attribute : $obj->{position}; # retrieve attribute }
Note that the method returns the latest value of the position attribute in both cases, because in both cases (get and set), the expression $obj->{position} is the last to be evaluated.
It might seem a complete waste of time to call a method every time you need to touch an attribute. But, as it happens, accessor methods are absolutely necessary in designing for change. Consider the following advantages:
Accessor methods hide how object attributes are stored. If you change the way this layout is done, only these methods need to be modified; the rest of the code, including derived classes, remain untouched. In Perl, as in other OO scripting languages, in which reengineering may be necessary for performance or space efficiency, accessor methods are a good thing. Smalltalk, CORBA (Common Object Request Broker Architecture), and ActiveX are other well-known cases in which the only way to an attribute is through an accessor.
Accessor methods are sometimes used for triggering actions in addition to retrieving or updating the attribute. GUI toolkits use this idiom routinely. For example:
$button->foreground_color('yellow');
This not only changes the value of the foreground color attribute, but updates the screen too.
Accessor methods can be made to disallow updates. For example, primary key attributes such as an employee's name should not be updatable once created; an accessor can easily enforce this.
An employee's income can be seen as an attribute, though internally it needs to be computed. Instead of writing a method like compute_ytd_income() , you simply call it income() . This makes it look like an attribute accessor, and it can disallow updates to this attribute.
Moral of the story: Get in the habit of writing accessor methods. In the next chapter, we will study a module called ObjectTemplate, a standard library called Class::Template, and a module on CPAN called MethodMaker, all of which automatically create accessor methods for you, so there is really no reason not to use such methods.
Caveat: Even if your attributes are wrapped in accessor methods, you should be wary of unrelated classes using these methods. When reviewing a piece of code, always look for the real intention behind these accesses; sometimes it may be better to provide other methods that make this access unnecessary. For example, a user should always use $emp->promote() instead of directly updating the position attribute.