urllib2

An extensible library for opening URLs using a variety of protocols

The simplest way to use this module is to call the urlopen function, which accepts a string containing a URL or a Request object (described below). It opens the URL and returns the results as file-like object; the returned object has some extra methods described below.

The OpenerDirector manages a collection of Handler objects that do all the actual work. Each Handler implements a particular protocol or option. The OpenerDirector is a composite object that invokes the Handlers needed to open the requested URL. For example, the HTTPHandler performs HTTP GET and POST requests and deals with non-error returns. The HTTPRedirectHandler automatically deals with HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler deals with digest authentication.

urlopen(url, data=None) – Basic usage is the same as original urllib. pass the url and optionally data to post to an HTTP URL, and get a file-like object back. One difference is that you can also pass a Request instance instead of URL. Raises a URLError (subclass of IOError); for HTTP errors, raises an HTTPError, which can also be treated as a valid response.

build_opener – Function that creates a new OpenerDirector instance. Will install the default handlers. Accepts one or more Handlers as arguments, either instances or Handler classes that it will instantiate. If one of the argument is a subclass of the default handler, the argument will be installed instead of the default.

install_opener – Installs a new opener as the default opener.

objects of interest:

OpenerDirector – Sets up the User Agent as the Python-urllib client and manages the Handler classes, while dealing with requests and responses.

Request – An object that encapsulates the state of a request. The state can be as simple as the URL. It can also include extra HTTP headers, e.g. a User-Agent.

BaseHandler –

exceptions: URLError – A subclass of IOError, individual protocols have their own specific subclass.

HTTPError – Also a valid HTTP response, so you can treat an HTTP error as an exceptional event or valid response.

internals: BaseHandler and parent _call_chain conventions

Example usage:

import urllib2

# set up authentication info authinfo = urllib2.HTTPBasicAuthHandler() authinfo.add_password(realm=’PDQ Application’,

uri=’https://mahler:8092/site-updates.py‘, user=’klem’, passwd=’geheim$parole’)

proxy_support = urllib2.ProxyHandler({“http” : “http://ahad-haam:3128“})

# build a new opener that adds authentication and caching FTP handlers opener = urllib2.build_opener(proxy_support, authinfo, urllib2.CacheFTPHandler)

# install it urllib2.install_opener(opener)

f = urllib2.urlopen(‘http://www.python.org/‘)

Functions

StringIO StringIO([s]) – Return a StringIO-like stream for reading or writing
build_opener(*handlers) Create an opener object from a list of handlers.
getproxies() Return a dictionary of scheme -> proxy server URL mappings.
install_opener(opener)
localhost() Return the IP address of the magic hostname ‘localhost’.
parse_http_list(s) Parse lists as described by RFC 2068 Section 2.
parse_keqv_list(l) Parse list of key=value strings where keys are not duplicated.
proxy_bypass(host[, proxies]) Test if proxies should not be used for a particular host.
quote(() -> ) Each part of a URL, e.g.
randombytes(n) Return n random bytes.
request_host(request) Return request-host, as defined by RFC 2965.
splitattr() ‘/path’, [‘attr1=value1’, ‘attr2=value2’, ...].
splithost(url) splithost(‘//host[:port]/path’) –> ‘host[:port]’, ‘/path’.
splitpasswd(() -> , .)
splitport(host) splitport(‘host:port’) –> ‘host’, ‘port’.
splittag(url) splittag(‘/path#tag’) –> ‘/path’, ‘tag’.
splittype(url) splittype(‘type:opaquestring’) –> ‘type’, ‘opaquestring’.
splituser(host) splituser(‘user[:passwd]@host[:port]’) –> ‘user[:passwd]’, ‘host[:port]’.
splitvalue(attr) splitvalue(‘attr=value’) –> ‘attr’, ‘value’.
toBytes(url) toBytes(u”URL”) –> ‘URL’.
unquote(() -> .)
unwrap(url) unwrap(‘<URL:type://host/path>’) –> ‘type://host/path’.
url2pathname(pathname) OS-specific conversion from a relative URL of the ‘file’ scheme to a file system path; not recommended for general use.
urlopen(url[, data, timeout, cafile, ...])

Classes

AbstractBasicAuthHandler([password_mgr])
AbstractDigestAuthHandler([passwd])
AbstractHTTPHandler([debuglevel])
BaseHandler
CacheFTPHandler()
FTPHandler
FileHandler
HTTPBasicAuthHandler([password_mgr])
HTTPCookieProcessor([cookiejar])
HTTPDefaultErrorHandler
HTTPDigestAuthHandler([passwd]) An authentication protocol defined by RFC 2069
HTTPErrorProcessor Process HTTP error responses.
HTTPHandler([debuglevel])
HTTPPasswordMgr()
HTTPPasswordMgrWithDefaultRealm()
HTTPRedirectHandler
HTTPSHandler([debuglevel, context])
OpenerDirector()
ProxyBasicAuthHandler([password_mgr])
ProxyDigestAuthHandler([passwd])
ProxyHandler([proxies])
Request(url[, data, headers, ...])
UnknownHandler
addinfourl(fp, headers, url[, code]) class to add info() and geturl() methods to an open file.
ftpwrapper(user, passwd, host, port, dirs[, ...]) Class used by open_ftp() for cache of open FTP connections.

Exceptions

HTTPError(url, code, msg, hdrs, fp) Raised when HTTP error occurs, but also acts like non-error return
URLError(reason)