Introduction to CGI (CGI Programming with Perl)

CGI can do so much because it is so simple. CGI is a very lightweight interface; it is essentially the minimum that the web server needs to provide in order to allow external processes to create web pages. Typically, when a web server gets a request for a static web page, the web server finds the corresponding HTML file on its filesystem. When a web server gets a request for a CGI script, the web server executes the CGI script as another process (i.e., a separate application); the server passes this process some parameters and collects its output, which it then returns to the client just as if had been fetched from a static file (see Figure 1-1).

Say you wanted to visit the URL, http://www.mikesmechanics.com/cgi/welcome.cgi. At its most basic, Example 1-1 shows a sample HTTP request your web browser might send.

1.2.1. Sample CGI

Let's look at a sample CGI application, written in Perl, that creates the dynamic output we just saw in Example 1-2. This program, shown in Example 1-3, determines where the user is connecting from and then creates a simple HTML document containing this information, along with the current time. In the next several chapters, we'll see how to use various CGI modules to make creating such an application even easier; for now, however, we will keep it straightforward.

Example 1-3. welcome.cgi

#!/usr/bin/perl -wT

use strict;

my $time        = localtime;
my $remote_id   = $ENV{REMOTE_HOST} || $ENV{REMOTE_ADDR};
my $admin_email = $ENV{SERVER_ADMIN};

print "Content-type: text/html\n\n";

print <<END_OF_PAGE;
<HTML>
<HEAD>
  <TITLE>Welcome to Mike's Mechanics Database</TITLE>
</HEAD>

<BODY BGCOLOR="#ffffff">
  <IMG SRC="/images/mike.jpg" ALT="Mike's Mechanics">
  <P>Welcome from $remote_host! What will you find here? You'll
    find a list of mechanics from around the country and the type of
    service to expect -- based on user input and suggestions.</P>
  <P>What are you waiting for? Click <A HREF="/cgi/list.cgi">here</A>
    to continue.</P>
  <HR>
  <P>The current time on this server is: $time.</P>
  <P>If you find any problems with this site or have any suggestions,
    please email <A HREF="mailto:$admin_email">$admin_email</A>.</P>
</BODY>
</HTML>
END_OF_PAGE

This program is quite simple. It contains only six commands, although the last one is many lines long. Let's take a look at how it works. Because this script is our first and is short, we'll look at it line by line; but as mentioned in the Preface, this book does assume that you are already familiar with Perl. So if you do not know Perl well or if your Perl is a little rusty, you may want to have a Perl reference available to consult as you read this book. We recommend Programming Perl, Third Edition, by Larry Wall, Tom Christiansen, and Jon Orwant (O'Reilly & Associates, Inc.); not only is it the standard Perl tome, but it also has a convenient alphabetical description of Perl's built-in functions.

The first line of the program looks like the top of most Perl scripts. It tells the server to use the program at /usr/bin/perl to interpret and execute this script. You may not recognize the flags, however: the -wT flags tell Perl to turn on warnings and taint checking. Warnings help locate subtle problems that may not generate syntax errors; enabling this is optional, but it is a very helpful feature. Taint checking should not be considered optional: unless you like living dangerously, you should enable this feature with all of your CGI scripts. We will discuss taint checking more in Chapter 8, "Security".

The command use strict tells Perl to enable strict rules for variables, subroutines, and references. If you haven't used this command before, you should get into the habit of using it with your CGI scripts. Like warnings, it helps locate subtle mistakes, such as typos, that might not otherwise generate a syntax error. Furthermore, the strict pragma encourages good programming practices by forcing you to declare variables and reduce the number of global variables. This produces code that is more maintainable. Finally, as we will see in Chapter 17, "Efficiency and Optimization", the strict pragma is essentially required by FastCGI and mod_perl. If you think you might migrate to either of these technologies in the future, you should begin using strict now.

Now we start the real work. First, we set three variables. The first variable, $time, is set to a string representing the current date and time. The second variable, $remote_id, is set to the identity of the remote machine requesting this page, and we get this information from the environment variables REMOTE_HOST or REMOTE_ADDR. As we mentioned earlier, CGI scripts get all of their information from the web server from environment variables and STDIN. REMOTE_HOST contains the full domain name of the remote machine, but only if reverse domain name lookups have been enabled for the web server -- otherwise, it is blank. In this case, we use REMOTE_ADDR instead, which contains the IP address of the remote machine. The final variable, $admin_email, is set to SERVER_ADMIN, which contains the email address of the server's administrator according to the server's configuration files. These are just a few environment variables available to CGI scripts. We'll review these three in more detail along with the rest in Chapter 3, "The Common Gateway Interface".

As we saw earlier, if a CGI script wants to return a new document, it must first output an HTTP header declaring the type of document it is returning. It does this and prints an additional blank line to indicate that it has finished sending headers. It then prints the body of the document.

Instead of using a print statement to send each line to standard output separately, we use a "here" document, which allows us to print a block of text at once. This is a standard Perl feature that's admittedly a little esoteric; you may not be familiar with this if you have not done other forms of shell programming. This command tells Perl to print all of the following lines until it encounters the END_OF_PAGE token on its own line. It treats the text as if it were enclosed in double quotes, so the variables are evaluated, but double quotes do not need to be escaped. Not only do "here" documents save us from a lot of extra typing, but they also make the program easier to read. However, there are even better ways of outputting HTML, as we'll see in Chapter 5, "CGI.pm", and Chapter 6, "HTML Templates".

That's all there is to our script, so at this point it exits; the web server adds additional HTTP headers and returns the response to the client as we saw in Example 1-2. This was just a simple example of a CGI script, and don't worry if you have questions or are unsure about a particular detail. As our numerous references to later chapters indicate, we'll spend the rest of the book filling in the details.

Example 1-3. welcome.cgi

1.2. Introduction to CGI

Figure 1-1. How a CGI application is executed

Example 1-1. Sample HTTP Request

Example 1-2. Sample HTTP Response

1.2.1. Sample CGI

1.2.2. Invoking CGI Scripts


1. Getting Started		1.3. Alternative Technologies