LWP, the library for web access in Perl, is a bundle of modules that provide a consistent, object-oriented approach to creating web applications. The library, downloaded as the single file named libwww-perl , contains the following classes:
Parses directory listings.
Handles Adobe Font Metrics.
Parses HTML files and converts them to printable or other forms.
Provides client requests, server responses, and protocol implementation.
The core of all web client programs. It creates network connections and manages the communication and transactions between client and server.
Creates, parses, and translates URLs.
Implements standards used for robots (automatic client programs).
Each module provides different building blocks that make up a whole web transaction - from connection, to request, to response and returned data. Each part is encapsulated by an object to give a standard interface to every web program you write. The following section gives an overview of how LWP works to create a web client.
Any web transaction requires an application that can establish a TCP/IP network connection and send and receive messages using the appropriate protocol (usually HTTP). TCP/IP connections are established using sockets, and messages are exchanged via socket filehandles. See Chapter 13, Sockets , for information on how to manually create socket applications. LWP provides an object for this application with LWP::UserAgent for clients; HTTP::Daemon provides a server object. The UserAgent object acts as the browser: it connects to a server, sends requests, receives responses, and manages the received data. This is how you create a UserAgent object:
The UserAgent now needs to send a message to a server requesting a URL (Universal Resource Locator) using theuse LWP::UserAgent; $ua = new LWP::UserAgent;
request
method.
request
forms an HTTP request from the object given as its argument. This request object is created by HTTP::Request.
An HTTP request message contains three elements. The first line of a message always contains an HTTP command called a method , a Universal Resource Identifier (URI), which identifies the file or resource the client is querying, and the HTTP version number. The following lines of a client request contain header information, which provides information about the client and any data it is sending the server. The third part of a client request is the entity body, which is data being sent to the server (for the POST method). The following is a sample HTTP request:
GET /index.html HTTP/1.0 User-Agent: Mozilla/1.1N (Macintosh; I; 68K) Accept: */* Accept: image/gif Accept: image/jpeg
LWP::UserAgent->request
forms this message from an HTTP::Request object. A request object requires a method for the first argument. The GET method asks for a file, while the POST method supplies information such as form data to a server application. There are other methods, but these two are most commonly used.
The second argument is the URL for the request. The URL must contain the server name, for this is how the UserAgent knows where to connect. The URL argument can be represented as a string or as a URI::URL object, which allows more complex URLs to be formed and managed. Optional parameters for an HTTP::Request include your own headers, in the form of an HTTP::Headers object, and any POST data for the message. The following example creates a request object:
The URL object is created like this:use HTTP::Request; $req = new HTTP::Request(GET, $url, $hdrs);
And a header object can be created like this:use URI::URL; $url = new URI::URL('www.ora.com/index.html');
Then you can put them all together to make a request:use HTTP::Headers; $hdrs = new HTTP::Headers(Accept => 'text/plain', User-Agent => 'MegaBrowser/1.0');
Once the request has been made by the user agent, the response from the server is returned as another object, described by HTTP::Response. This object contains the status code of the request, returned headers, and the content you requested, if successful. In the example,use LWP::UserAgent; # This will cover all of them! $hdrs = new HTTP::Headers(Accept => 'text/plain', User-Agent => 'MegaBrowser/1.0'); $url = new URI::URL('www.ora.com/index.html'); $req = new HTTP::Request(GET, $url, $hdrs); $ua = new LWP::UserAgent; $resp = $ua->request($req); if ($resp->is_success) { print $resp->content;} else { print $resp->message;}
is_success
checks to see if the request was fulfilled without problems, thus outputting the content. If unsuccessful, a message describing the server's response code is printed.
There are other modules and classes that create useful objects for web clients in LWP, but the above examples show the most basic ones. For server applications, many of the objects used above become pieces of a server transaction, which you either create yourself (such as response objects) or receive from a client (like request objects).
Additional functionality for both client and server applications is provided by the HTML module. This module provides many classes for both the creation and interpretation of HTML documents.
The rest of this chapter provides information for the LWP, HTTP, HTML, and URI modules.