urllib2
¶
An extensible library for opening URLs using a variety of protocols
The simplest way to use this module is to call the urlopen function, which accepts a string containing a URL or a Request object (described below). It opens the URL and returns the results as file-like object; the returned object has some extra methods described below.
The OpenerDirector manages a collection of Handler objects that do all the actual work. Each Handler implements a particular protocol or option. The OpenerDirector is a composite object that invokes the Handlers needed to open the requested URL. For example, the HTTPHandler performs HTTP GET and POST requests and deals with non-error returns. The HTTPRedirectHandler automatically deals with HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler deals with digest authentication.
urlopen(url, data=None) – Basic usage is the same as original urllib. pass the url and optionally data to post to an HTTP URL, and get a file-like object back. One difference is that you can also pass a Request instance instead of URL. Raises a URLError (subclass of IOError); for HTTP errors, raises an HTTPError, which can also be treated as a valid response.
build_opener – Function that creates a new OpenerDirector instance. Will install the default handlers. Accepts one or more Handlers as arguments, either instances or Handler classes that it will instantiate. If one of the argument is a subclass of the default handler, the argument will be installed instead of the default.
install_opener – Installs a new opener as the default opener.
objects of interest:
OpenerDirector – Sets up the User Agent as the Python-urllib client and manages the Handler classes, while dealing with requests and responses.
Request – An object that encapsulates the state of a request. The state can be as simple as the URL. It can also include extra HTTP headers, e.g. a User-Agent.
BaseHandler –
exceptions: URLError – A subclass of IOError, individual protocols have their own specific subclass.
HTTPError – Also a valid HTTP response, so you can treat an HTTP error as an exceptional event or valid response.
internals: BaseHandler and parent _call_chain conventions
Example usage:
import urllib2
# set up authentication info authinfo = urllib2.HTTPBasicAuthHandler() authinfo.add_password(realm=’PDQ Application’,
uri=’https://mahler:8092/site-updates.py‘, user=’klem’, passwd=’geheim$parole’)
proxy_support = urllib2.ProxyHandler({“http” : “http://ahad-haam:3128“})
# build a new opener that adds authentication and caching FTP handlers opener = urllib2.build_opener(proxy_support, authinfo, urllib2.CacheFTPHandler)
# install it urllib2.install_opener(opener)
f = urllib2.urlopen(‘http://www.python.org/‘)
Functions¶
StringIO |
StringIO([s]) – Return a StringIO-like stream for reading or writing |
build_opener (*handlers) |
Create an opener object from a list of handlers. |
getproxies () |
Return a dictionary of scheme -> proxy server URL mappings. |
install_opener (opener) |
|
localhost () |
Return the IP address of the magic hostname ‘localhost’. |
parse_http_list (s) |
Parse lists as described by RFC 2068 Section 2. |
parse_keqv_list (l) |
Parse list of key=value strings where keys are not duplicated. |
proxy_bypass (host[, proxies]) |
Test if proxies should not be used for a particular host. |
quote (() -> ) |
Each part of a URL, e.g. |
randombytes (n) |
Return n random bytes. |
request_host (request) |
Return request-host, as defined by RFC 2965. |
splitattr () |
‘/path’, [‘attr1=value1’, ‘attr2=value2’, ...]. |
splithost (url) |
splithost(‘//host[:port]/path’) –> ‘host[:port]’, ‘/path’. |
splitpasswd (() -> , .) |
|
splitport (host) |
splitport(‘host:port’) –> ‘host’, ‘port’. |
splittag (url) |
splittag(‘/path#tag’) –> ‘/path’, ‘tag’. |
splittype (url) |
splittype(‘type:opaquestring’) –> ‘type’, ‘opaquestring’. |
splituser (host) |
splituser(‘user[:passwd]@host[:port]’) –> ‘user[:passwd]’, ‘host[:port]’. |
splitvalue (attr) |
splitvalue(‘attr=value’) –> ‘attr’, ‘value’. |
toBytes (url) |
toBytes(u”URL”) –> ‘URL’. |
unquote (() -> .) |
|
unwrap (url) |
unwrap(‘<URL:type://host/path>’) –> ‘type://host/path’. |
url2pathname (pathname) |
OS-specific conversion from a relative URL of the ‘file’ scheme to a file system path; not recommended for general use. |
urlopen (url[, data, timeout, cafile, ...]) |
Classes¶
AbstractBasicAuthHandler ([password_mgr]) |
|
AbstractDigestAuthHandler ([passwd]) |
|
AbstractHTTPHandler ([debuglevel]) |
|
BaseHandler |
|
CacheFTPHandler () |
|
FTPHandler |
|
FileHandler |
|
HTTPBasicAuthHandler ([password_mgr]) |
|
HTTPCookieProcessor ([cookiejar]) |
|
HTTPDefaultErrorHandler |
|
HTTPDigestAuthHandler ([passwd]) |
An authentication protocol defined by RFC 2069 |
HTTPErrorProcessor |
Process HTTP error responses. |
HTTPHandler ([debuglevel]) |
|
HTTPPasswordMgr () |
|
HTTPPasswordMgrWithDefaultRealm () |
|
HTTPRedirectHandler |
|
HTTPSHandler ([debuglevel, context]) |
|
OpenerDirector () |
|
ProxyBasicAuthHandler ([password_mgr]) |
|
ProxyDigestAuthHandler ([passwd]) |
|
ProxyHandler ([proxies]) |
|
Request (url[, data, headers, ...]) |
|
UnknownHandler |
|
addinfourl (fp, headers, url[, code]) |
class to add info() and geturl() methods to an open file. |
ftpwrapper (user, passwd, host, port, dirs[, ...]) |
Class used by open_ftp() for cache of open FTP connections. |