This example shows how you might represent professor, student, and course data as hierarchical records and how to link them up. Assume that the data files look like this:
#file: professor.dat id : 42343 #Employee Id Name : E.F.Schumacher Office Hours: Mon 3-4, Wed 8-9 Courses : HS201, SS343 #Course taught ... #file: student.dat id : 52003 # Registration id Name : Garibaldi Courses : H301, H302, M201 # Courses taken ... #file: courses.dat id : HS201 Description : Small is beautiful Class Hours : Mon 2-4, Wed 9-10, Thu 4-5 ...
Each "id:" line starts a new record.
Among other tasks, let us say we are required to find out whether there is a scheduling conflict on professors' and students' hours. Because our focus is on data representation and getting a feel for Perl's reference syntax, we will look at implementing only some parts of the problem.
A hash table is a good representation for a heterogeneous record, as we mentioned earlier, so a student structure may be implemented like this:
$student{42343} = { 'Name' => 'Garibaldi', 'Courses' => [ ]};
A number of subtle design choices have been made here.
We could have replaced "foreign keys" (to use the database term) such as "HS201" with references to the corresponding course data structures. We didn't, because it is then tempting to directly dereference these references, in which case the student code is aware of how the course data is structured.
We maintain separate global hash tables for students, courses, and professors - yet another effort to keep mostly unrelated data completely separate and to make it possible to change a part of the system without affecting everyone.
There is one piece of data we haven't discussed before: time ranges. Both professors and courses have certain "busy" or "active" hours. What is a good representation for this? You might choose to represent the line "Mon 2-3, Tue 4-6" as follows:
$time_range = { 'Mon' => [2,3], 'Tue' => [4,6] };
There is a much simpler representation, in case you haven't already guessed it. The key insight is that since we are concerned only with clashes in time, the system should be able to quickly tell us whether a professor or a course is "active" in a given hour of the week or not. Considering that there are only 24 * 7 = 168 hours in a week, the entire week's schedule can be represented by a bitmap vector of 21 bytes (168/8). If a bit is set, we know that the professor is teaching something in that hour. In fact we can reduce the storage requirements further if we only account for the effective hours in a week (say, 7:00 A.M. to 7:00 P.M., Monday to Friday). That brings it down to 8 bytes (12 hours * 5 days / 8). The nice thing here is that an entire sequence of time ranges boils down to one scalar containing a bitmap vector. The other cool thing is that you can obtain time conflicts by logically AND-ing two bitmaps.
Having settled the representation, let us write some code to read the professor.dat file, and construct the data structures.
my (%profs); # prof_read_file() populates this data structure from file sub prof_read_file { my ($filename) = @_; my ($line, $curr_prof); open (F, $filename) || die "Could not open $filename"; while ($line = <F>) { chomp($line); next if $line =~ /^\s*$/; # skip blank lines if ($line =~ /^id.*:\s*(.*)/) { # Use an anonymous hash to store a professor's data $profs{$1} = $curr_prof = {}; } elsif ($line =~ /^Office Hours.*:\s*(.*)/) { # $1 contains a string like 'Mon 2-3, Tue 4-6' $curr_prof->{Office Hours} = interval_parse($1); } elsif ($line =~ /^Courses.*:\s*(.*)/) { # $1 contains something like 'HS201, MA101' my (@courses_taught) = split(/[\s,]+/, $1); $curr_prof->{Courses} = \@courses_taught; } } }
Notice that the courses_taught array is local to the block. When the block ends, $curr_prof->{Courses} continues to hang on to this array. You can omit one step like this:
$curr_prof->{Courses} = [split(/[\s,]+/, $1)];
I prefer the earlier approach because it is more readable.
The interval_parse method parses a string such as "Mon 3-5, Wed 2-6" into a bit string, as was mentioned earlier. The code looks like this:
# Each hour in a week (with days from 7am to 7pm) gets its own # unique bit in an 8-byte string. # Mon 7-8 is the 0th bit, Mon 6-7pm is 11, ... Fri 6-7 (pm) is 60th. my %base_hours = ( mon => 0, tue => 12, wed => 24 , thu => 36, fri => 48 ); sub interval_parse { my ($interval_sequence) = @_; #contains "Mon 3-5, Tue 2-6" my ($time_range) = ""; foreach $day_hours (split /,/, $interval_sequence) { # $day_hours contains "Mon 3-5" etc. my ($day, $from, $to) = ($day_hours =~ /([A-Za-z]+).*(\d+)-(\d+)/); # if $from or $to is less than 7, it must be afternoon. Normalize # it by adding 12. Then reduce it to a zero base by subtracting 7 # (that is, 7 hrs to 19 hrs becomes 0 - 12. Finally, # normalize each hour in a day with respect to weekly hours, # by adding in the day's "base hour" $to = 19 if $to == 7; $from += 12 if $from < 7 ; $to += 12 if $to <= 7; my $base = $base_hours{lc $day}; $from += $base - 7; $to += $base - 7; # At this point Tue 7a.m ==> 12 and Tue 4 p.m => 21 for ($i = $from; $i < $to; $i++) { # Set the corresponding bit vec($time_range, $i, 1) = 1; } } $time_range; }
To check for scheduling constraints on a professor's time, we have to calculate overlapping hours between the professor's office hours and each course he or she teaches and between the courses themselves, as shown in Example 2.4 .
sub prof_check_constraints { my ($prof) = @_; my $r_prof = $profs{$prof}; # %profs created by prof_read_file my $office_hours = $r_prof->{Office Hours}; my $rl_courses = $r_prof->{Courses} ; for $i (0 .. $#{$rl_courses}) { $course_hours = course_get_hours($rl_courses->[$i]); if (interval_conflicts($office_hours, $course_hours)) { print "Prof. ", $r_prof->{name}, " Office hours conflict with course $course_taught\n"; } for $j ($i .. $#{$rl_courses}) { my ($other_course_hours) = course_get_hours($rl_courses->[$j]); if (interval_conflicts ($course_hours, $other_course_hours)) { print "Prof. ", $r_prof->{name}, ": Course conflict: ", $rl_courses->[$i], " " $rl_courses->[$j], "\n"; } } }
The subroutine
interval_conflicts
simply compares the two bitmaps, as shown below:
sub interval_conflicts { my ($t1, $t2) = @_; my ($combined) = $t1 & $t2; # $combined will have at least one bit set if there's a conflict my $offset = length($combined) * 8; # start counting down from last bit, and see if any is set while (--$offset >= 0) { return 1 if vec($combined,$offset,1); } return 0; }
Note that all knowledge of the internal representation of a time interval is encapsulated in functions with the prefix
interval_
. These functions thus encapsulate an
abstract data type
called "interval." When we study modules and objects in later chapters, we will learn ways of organizing such pieces of code into reusable entities.
Copyright © 2001 O'Reilly & Associates. All rights reserved.