bs4.BeautifulSoup

class bs4.BeautifulSoup(markup='', features=None, builder=None, parse_only=None, from_encoding=None, exclude_encodings=None, **kwargs)[source]

This class defines the basic interface called by the tree builders.

These methods will be called by the parser:
reset() feed(markup)
The tree builder may call these methods from its feed() implementation:
handle_starttag(name, attrs) # See note about return value handle_endtag(name) handle_data(data) # Appends to the current data node endData(containerClass=NavigableString) # Ends the current data node

No matter how complicated the underlying parser is, you should be able to build a tree using ‘start tag’ events, ‘end tag’ events, ‘data’ events, and “done with data” events.

If you encounter an empty-element tag (aka a self-closing tag, like HTML’s <br> tag), call handle_starttag and then handle_endtag.

__init__(markup='', features=None, builder=None, parse_only=None, from_encoding=None, exclude_encodings=None, **kwargs)[source]

The Soup object is initialized as the ‘root tag’, and the provided markup (which can be a string or a file-like object) is fed into the underlying parser.

Methods

__init__([markup, features, builder, ...]) The Soup object is initialized as the ‘root tag’, and the provided markup (which can be a string or a file-like object) is fed into the underlying parser.
append(tag) Appends the given tag to the contents of this tag.
childGenerator()
clear([decompose]) Extract all children.
decode([pretty_print, eventual_encoding, ...]) Returns a string or Unicode representation of this document.
decode_contents([indent_level, ...]) Renders the contents of this tag as a Unicode string.
decompose() Recursively destroys the contents of this tree.
encode([encoding, indent_level, formatter, ...])
encode_contents([indent_level, encoding, ...]) Renders the contents of this tag as a bytestring.
endData([containerClass])
extract() Destructively rips this element out of the tree.
fetchNextSiblings([name, attrs, text, limit]) Returns the siblings of this Tag that match the given criteria and appear after this Tag in the document.
fetchParents([name, attrs, limit]) Returns the parents of this Tag that match the given criteria.
fetchPrevious([name, attrs, text, limit]) Returns all items that match the given criteria and appear before this Tag in the document.
fetchPreviousSiblings([name, attrs, text, limit]) Returns the siblings of this Tag that match the given criteria and appear before this Tag in the document.
find([name, attrs, recursive, text]) Return only the first child of this Tag matching the given criteria.
findAll([name, attrs, recursive, text, limit]) Extracts a list of Tag objects that match the given criteria.
findAllNext([name, attrs, text, limit]) Returns all items that match the given criteria and appear after this Tag in the document.
findAllPrevious([name, attrs, text, limit]) Returns all items that match the given criteria and appear before this Tag in the document.
findChild([name, attrs, recursive, text]) Return only the first child of this Tag matching the given criteria.
findChildren([name, attrs, recursive, text, ...]) Extracts a list of Tag objects that match the given criteria.
findNext([name, attrs, text]) Returns the first item that matches the given criteria and appears after this Tag in the document.
findNextSibling([name, attrs, text]) Returns the closest sibling to this Tag that matches the given criteria and appears after this Tag in the document.
findNextSiblings([name, attrs, text, limit]) Returns the siblings of this Tag that match the given criteria and appear after this Tag in the document.
findParent([name, attrs]) Returns the closest parent of this Tag that matches the given criteria.
findParents([name, attrs, limit]) Returns the parents of this Tag that match the given criteria.
findPrevious([name, attrs, text]) Returns the first item that matches the given criteria and appears before this Tag in the document.
findPreviousSibling([name, attrs, text]) Returns the closest sibling to this Tag that matches the given criteria and appears before this Tag in the document.
findPreviousSiblings([name, attrs, text, limit]) Returns the siblings of this Tag that match the given criteria and appear before this Tag in the document.
find_all([name, attrs, recursive, text, limit]) Extracts a list of Tag objects that match the given criteria.
find_all_next([name, attrs, text, limit]) Returns all items that match the given criteria and appear after this Tag in the document.
find_all_previous([name, attrs, text, limit]) Returns all items that match the given criteria and appear before this Tag in the document.
find_next([name, attrs, text]) Returns the first item that matches the given criteria and appears after this Tag in the document.
find_next_sibling([name, attrs, text]) Returns the closest sibling to this Tag that matches the given criteria and appears after this Tag in the document.
find_next_siblings([name, attrs, text, limit]) Returns the siblings of this Tag that match the given criteria and appear after this Tag in the document.
find_parent([name, attrs]) Returns the closest parent of this Tag that matches the given criteria.
find_parents([name, attrs, limit]) Returns the parents of this Tag that match the given criteria.
find_previous([name, attrs, text]) Returns the first item that matches the given criteria and appears before this Tag in the document.
find_previous_sibling([name, attrs, text]) Returns the closest sibling to this Tag that matches the given criteria and appears before this Tag in the document.
find_previous_siblings([name, attrs, text, ...]) Returns the siblings of this Tag that match the given criteria and appear before this Tag in the document.
format_string(s[, formatter]) Format the given string using the given formatter.
get(key[, default]) Returns the value of the ‘key’ attribute for the tag, or the value given for ‘default’ if it doesn’t have that attribute.
getText([separator, strip, types]) Get all child strings, concatenated using the given separator.
get_text([separator, strip, types]) Get all child strings, concatenated using the given separator.
handle_data(data)
handle_endtag(name[, nsprefix])
handle_starttag(name, namespace, nsprefix, attrs) Push a start tag on to the stack.
has_attr(key)
has_key(key) This was kind of misleading because has_key() (attributes) was different from __in__ (contents).
index(element) Find the index of a child by identity, not value.
insert(position, new_child)
insert_after(successor)
insert_before(successor)
new_string(s[, subclass]) Create a new NavigableString associated with this soup.
new_tag(name[, namespace, nsprefix]) Create a new tag associated with this soup.
nextGenerator()
nextSiblingGenerator()
object_was_parsed(o[, parent, ...]) Add an object to the parse tree.
parentGenerator()
popTag()
prettify([encoding, formatter])
previousGenerator()
previousSiblingGenerator()
pushTag(tag)
recursiveChildGenerator()
renderContents([encoding, prettyPrint, ...])
replaceWith(replace_with)
replaceWithChildren()
replace_with(replace_with)
replace_with_children()
reset()
select(selector[, _candidate_generator, limit]) Perform a CSS selection operation on the current element.
select_one(selector) Perform a CSS selection operation on the current element.
setup([parent, previous_element, ...]) Sets up the initial relations between this element and other elements.
unwrap()
wrap(wrap_inside)

Attributes

ASCII_SPACES
DEFAULT_BUILDER_FEATURES
HTML_FORMATTERS
NO_PARSER_SPECIFIED_WARNING
ROOT_TAG_NAME
XML_FORMATTERS
attribselect_re
children
descendants
isSelfClosing Is this tag an empty-element tag? (aka a self-closing tag)
is_empty_element Is this tag an empty-element tag? (aka a self-closing tag)
next
nextSibling
next_elements
next_siblings
parents
parserClass
previous
previousSibling
previous_elements
previous_siblings
string Convenience property to get the single string within this tag.
strings Yield all strings of certain classes, possibly stripping them.
stripped_strings
tag_name_re
text Get all child strings, concatenated using the given separator.