HTMLParser.HTMLParser

class HTMLParser.HTMLParser[source]

Find tags and other markup and call handler functions.

Usage:
p = HTMLParser() p.feed(data) ... p.close()

Start tags are handled by calling self.handle_starttag() or self.handle_startendtag(); end tags by self.handle_endtag(). The data between tags is passed from the parser to the derived class by calling self.handle_data() with the data as argument (the data may be split up in arbitrary chunks). Entity references are passed by calling self.handle_entityref() with the entity reference as the argument. Numeric character references are passed to self.handle_charref() with the string containing the reference as the argument.

Methods

__init__() Initialize and reset this instance.
_parse_doctype_attlist(i, declstartpos)
_parse_doctype_element(i, declstartpos)
_parse_doctype_entity(i, declstartpos)
_parse_doctype_notation(i, declstartpos)
_parse_doctype_subset(i, declstartpos)
_scan_name(i, declstartpos)
check_for_whole_start_tag(i)
clear_cdata_mode()
close() Handle any buffered data.
error(message)
feed(data) Feed data to the parser.
get_starttag_text() Return full source of start tag: ‘<...>’.
getpos() Return current line number and offset.
goahead(end)
handle_charref(name)
handle_comment(data)
handle_data(data)
handle_decl(decl)
handle_endtag(tag)
handle_entityref(name)
handle_pi(data)
handle_startendtag(tag, attrs)
handle_starttag(tag, attrs)
parse_bogus_comment(i[, report])
parse_comment(i[, report])
parse_declaration(i)
parse_endtag(i)
parse_html_declaration(i)
parse_marked_section(i[, report])
parse_pi(i)
parse_starttag(i)
reset() Reset this instance.
set_cdata_mode(elem)
unescape(s)
unknown_decl(data)
updatepos(i, j)

Attributes

CDATA_CONTENT_ELEMENTS
entitydefs