bs4.BeautifulSoup¶

class bs4.BeautifulSoup(markup='', features=None, builder=None, parse_only=None, from_encoding=None, exclude_encodings=None, **kwargs)[source]¶

This class defines the basic interface called by the tree builders.

These methods will be called by the parser:: reset() feed(markup)
The tree builder may call these methods from its feed() implementation:: handle_starttag(name, attrs) # See note about return value handle_endtag(name) handle_data(data) # Appends to the current data node endData(containerClass=NavigableString) # Ends the current data node

No matter how complicated the underlying parser is, you should be able to build a tree using ‘start tag’ events, ‘end tag’ events, ‘data’ events, and “done with data” events.

If you encounter an empty-element tag (aka a self-closing tag, like HTML’s <br> tag), call handle_starttag and then handle_endtag.

__init__(markup='', features=None, builder=None, parse_only=None, from_encoding=None, exclude_encodings=None, **kwargs)[source]¶: The Soup object is initialized as the ‘root tag’, and the provided markup (which can be a string or a file-like object) is fed into the underlying parser.

Methods¶

`__init__`([markup, features, builder, ...])	The Soup object is initialized as the ‘root tag’, and the provided markup (which can be a string or a file-like object) is fed into the underlying parser.
`append`(tag)	Appends the given tag to the contents of this tag.
`childGenerator`()
`clear`([decompose])	Extract all children.
`decode`([pretty_print, eventual_encoding, ...])	Returns a string or Unicode representation of this document.
`decode_contents`([indent_level, ...])	Renders the contents of this tag as a Unicode string.
`decompose`()	Recursively destroys the contents of this tree.
`encode`([encoding, indent_level, formatter, ...])
`encode_contents`([indent_level, encoding, ...])	Renders the contents of this tag as a bytestring.
`endData`([containerClass])
`extract`()	Destructively rips this element out of the tree.
`fetchNextSiblings`([name, attrs, text, limit])	Returns the siblings of this Tag that match the given criteria and appear after this Tag in the document.
`fetchParents`([name, attrs, limit])	Returns the parents of this Tag that match the given criteria.
`fetchPrevious`([name, attrs, text, limit])	Returns all items that match the given criteria and appear before this Tag in the document.
`fetchPreviousSiblings`([name, attrs, text, limit])	Returns the siblings of this Tag that match the given criteria and appear before this Tag in the document.
`find`([name, attrs, recursive, text])	Return only the first child of this Tag matching the given criteria.
`findAll`([name, attrs, recursive, text, limit])	Extracts a list of Tag objects that match the given criteria.
`findAllNext`([name, attrs, text, limit])	Returns all items that match the given criteria and appear after this Tag in the document.
`findAllPrevious`([name, attrs, text, limit])	Returns all items that match the given criteria and appear before this Tag in the document.
`findChild`([name, attrs, recursive, text])	Return only the first child of this Tag matching the given criteria.
`findChildren`([name, attrs, recursive, text, ...])	Extracts a list of Tag objects that match the given criteria.
`findNext`([name, attrs, text])	Returns the first item that matches the given criteria and appears after this Tag in the document.
`findNextSibling`([name, attrs, text])	Returns the closest sibling to this Tag that matches the given criteria and appears after this Tag in the document.
`findNextSiblings`([name, attrs, text, limit])	Returns the siblings of this Tag that match the given criteria and appear after this Tag in the document.
`findParent`([name, attrs])	Returns the closest parent of this Tag that matches the given criteria.
`findParents`([name, attrs, limit])	Returns the parents of this Tag that match the given criteria.
`findPrevious`([name, attrs, text])	Returns the first item that matches the given criteria and appears before this Tag in the document.
`findPreviousSibling`([name, attrs, text])	Returns the closest sibling to this Tag that matches the given criteria and appears before this Tag in the document.
`findPreviousSiblings`([name, attrs, text, limit])	Returns the siblings of this Tag that match the given criteria and appear before this Tag in the document.
`find_all`([name, attrs, recursive, text, limit])	Extracts a list of Tag objects that match the given criteria.
`find_all_next`([name, attrs, text, limit])	Returns all items that match the given criteria and appear after this Tag in the document.
`find_all_previous`([name, attrs, text, limit])	Returns all items that match the given criteria and appear before this Tag in the document.
`find_next`([name, attrs, text])	Returns the first item that matches the given criteria and appears after this Tag in the document.
`find_next_sibling`([name, attrs, text])	Returns the closest sibling to this Tag that matches the given criteria and appears after this Tag in the document.
`find_next_siblings`([name, attrs, text, limit])	Returns the siblings of this Tag that match the given criteria and appear after this Tag in the document.
`find_parent`([name, attrs])	Returns the closest parent of this Tag that matches the given criteria.
`find_parents`([name, attrs, limit])	Returns the parents of this Tag that match the given criteria.
`find_previous`([name, attrs, text])	Returns the first item that matches the given criteria and appears before this Tag in the document.
`find_previous_sibling`([name, attrs, text])	Returns the closest sibling to this Tag that matches the given criteria and appears before this Tag in the document.
`find_previous_siblings`([name, attrs, text, ...])	Returns the siblings of this Tag that match the given criteria and appear before this Tag in the document.
`format_string`(s[, formatter])	Format the given string using the given formatter.
`get`(key[, default])	Returns the value of the ‘key’ attribute for the tag, or the value given for ‘default’ if it doesn’t have that attribute.
`getText`([separator, strip, types])	Get all child strings, concatenated using the given separator.
`get_text`([separator, strip, types])	Get all child strings, concatenated using the given separator.
`handle_data`(data)
`handle_endtag`(name[, nsprefix])
`handle_starttag`(name, namespace, nsprefix, attrs)	Push a start tag on to the stack.
`has_attr`(key)
`has_key`(key)	This was kind of misleading because has_key() (attributes) was different from __in__ (contents).
`index`(element)	Find the index of a child by identity, not value.
`insert`(position, new_child)
`insert_after`(successor)
`insert_before`(successor)
`new_string`(s[, subclass])	Create a new NavigableString associated with this soup.
`new_tag`(name[, namespace, nsprefix])	Create a new tag associated with this soup.
`nextGenerator`()
`nextSiblingGenerator`()
`object_was_parsed`(o[, parent, ...])	Add an object to the parse tree.
`parentGenerator`()
`popTag`()
`prettify`([encoding, formatter])
`previousGenerator`()
`previousSiblingGenerator`()
`pushTag`(tag)
`recursiveChildGenerator`()
`renderContents`([encoding, prettyPrint, ...])
`replaceWith`(replace_with)
`replaceWithChildren`()
`replace_with`(replace_with)
`replace_with_children`()
`reset`()
`select`(selector[, _candidate_generator, limit])	Perform a CSS selection operation on the current element.
`select_one`(selector)	Perform a CSS selection operation on the current element.
`setup`([parent, previous_element, ...])	Sets up the initial relations between this element and other elements.
`unwrap`()
`wrap`(wrap_inside)

Attributes¶

`ASCII_SPACES`
`DEFAULT_BUILDER_FEATURES`
`HTML_FORMATTERS`
`NO_PARSER_SPECIFIED_WARNING`
`ROOT_TAG_NAME`
`XML_FORMATTERS`
`attribselect_re`
`children`
`descendants`
`isSelfClosing`	Is this tag an empty-element tag? (aka a self-closing tag)
`is_empty_element`	Is this tag an empty-element tag? (aka a self-closing tag)
`next`
`nextSibling`
`next_elements`
`next_siblings`
`parents`
`parserClass`
`previous`
`previousSibling`
`previous_elements`
`previous_siblings`
`string`	Convenience property to get the single string within this tag.
`strings`	Yield all strings of certain classes, possibly stripping them.
`stripped_strings`
`tag_name_re`
`text`	Get all child strings, concatenated using the given separator.