bs4

Beautiful Soup Elixir and Tonic “The Screen-Scraper’s Friend” http://www.crummy.com/software/BeautifulSoup/

Beautiful Soup uses a pluggable XML or HTML parser to parse a (possibly invalid) document into a tree representation. Beautiful Soup provides provides methods and Pythonic idioms that make it easy to navigate, search, and modify the parse tree.

Beautiful Soup works with Python 2.6 and up. It works better if lxml and/or html5lib is installed.

For more than you ever wanted to know about Beautiful Soup, see the documentation: http://www.crummy.com/software/BeautifulSoup/bs4/doc/

Classes

BeautifulSoup([markup, features, builder, ...]) This class defines the basic interface called by the tree builders.
BeautifulStoneSoup(*args, **kwargs) Deprecated interface to an XML parser.
CData
Comment
Declaration
Doctype
NavigableString
PageElement Contains the navigational information for some part of the page
ProcessingInstruction
ResultSet(source[, result]) A ResultSet is just a list that keeps track of the SoupStrainer that created it.
SoupStrainer([name, attrs, text]) Encapsulates a number of ways of matching a markup element (tag or text).
Tag([parser, builder, name, namespace, ...]) Represents a found HTML tag with its attributes and contents.
UnicodeDammit(markup[, override_encodings, ...]) A class for detecting the encoding of a *ML document and converting it to a Unicode string.