start page | rating of books | rating of authors | reviews | copyrights

Unix Power ToolsUnix Power ToolsSearch this book

42.5. urllib

The application-level access to most web client activities is through modules called urllib and urllib2 (Section 42.6). urllib is the simple web interface; it provides basic functions for opening and retrieving web resources via their URLs.

The primary functions in urllib are urlopen( ), which opens an URL and returns a file-like object, and urlretrieve( ), which retrieves the entire web resource at the given URL. The file-like object returned by urlopen supports the following methods: read( ), readline( ), readlines( ), fileno( ), close( ), info( ), and geturl( ). The first five methods work just like their file counterparts. info( ) returns a mimetools.Message object, which for HTTP requests contains the HTTP headers associated with the URL. geturl( ) returns the real URL of the resource, since the client may have been redirected by the web server before getting the actual content.

urlretrieve( ) returns a tuple (filename, info), where filename is the local file to which the web resource was copied and info is the same as the return value from urlopen's info( ) method.

If the result from either urlopen( ) or urlretrieve( ) is HTML, you can use htmllib to parse it.

urllib also provides a function urlencode( ), which converts standard tuples or dictionaries into properly URL-encoded queries. Here is an example session that uses the GET method to retrieve a URL containing parameters:

>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
>>> print f.read( )

The following example performs the same query but uses the POST method instead:

>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
>>> print f.read( )

-- DJPH



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.