bs4.BeautifulSoup¶
-
class
bs4.BeautifulSoup(markup='', features=None, builder=None, parse_only=None, from_encoding=None, exclude_encodings=None, **kwargs)[source]¶ This class defines the basic interface called by the tree builders.
- These methods will be called by the parser:
- reset() feed(markup)
- The tree builder may call these methods from its feed() implementation:
- handle_starttag(name, attrs) # See note about return value handle_endtag(name) handle_data(data) # Appends to the current data node endData(containerClass=NavigableString) # Ends the current data node
No matter how complicated the underlying parser is, you should be able to build a tree using ‘start tag’ events, ‘end tag’ events, ‘data’ events, and “done with data” events.
If you encounter an empty-element tag (aka a self-closing tag, like HTML’s <br> tag), call handle_starttag and then handle_endtag.
Methods¶
__init__([markup, features, builder, ...]) |
The Soup object is initialized as the ‘root tag’, and the provided markup (which can be a string or a file-like object) is fed into the underlying parser. |
append(tag) |
Appends the given tag to the contents of this tag. |
childGenerator() |
|
clear([decompose]) |
Extract all children. |
decode([pretty_print, eventual_encoding, ...]) |
Returns a string or Unicode representation of this document. |
decode_contents([indent_level, ...]) |
Renders the contents of this tag as a Unicode string. |
decompose() |
Recursively destroys the contents of this tree. |
encode([encoding, indent_level, formatter, ...]) |
|
encode_contents([indent_level, encoding, ...]) |
Renders the contents of this tag as a bytestring. |
endData([containerClass]) |
|
extract() |
Destructively rips this element out of the tree. |
fetchNextSiblings([name, attrs, text, limit]) |
Returns the siblings of this Tag that match the given criteria and appear after this Tag in the document. |
fetchParents([name, attrs, limit]) |
Returns the parents of this Tag that match the given criteria. |
fetchPrevious([name, attrs, text, limit]) |
Returns all items that match the given criteria and appear before this Tag in the document. |
fetchPreviousSiblings([name, attrs, text, limit]) |
Returns the siblings of this Tag that match the given criteria and appear before this Tag in the document. |
find([name, attrs, recursive, text]) |
Return only the first child of this Tag matching the given criteria. |
findAll([name, attrs, recursive, text, limit]) |
Extracts a list of Tag objects that match the given criteria. |
findAllNext([name, attrs, text, limit]) |
Returns all items that match the given criteria and appear after this Tag in the document. |
findAllPrevious([name, attrs, text, limit]) |
Returns all items that match the given criteria and appear before this Tag in the document. |
findChild([name, attrs, recursive, text]) |
Return only the first child of this Tag matching the given criteria. |
findChildren([name, attrs, recursive, text, ...]) |
Extracts a list of Tag objects that match the given criteria. |
findNext([name, attrs, text]) |
Returns the first item that matches the given criteria and appears after this Tag in the document. |
findNextSibling([name, attrs, text]) |
Returns the closest sibling to this Tag that matches the given criteria and appears after this Tag in the document. |
findNextSiblings([name, attrs, text, limit]) |
Returns the siblings of this Tag that match the given criteria and appear after this Tag in the document. |
findParent([name, attrs]) |
Returns the closest parent of this Tag that matches the given criteria. |
findParents([name, attrs, limit]) |
Returns the parents of this Tag that match the given criteria. |
findPrevious([name, attrs, text]) |
Returns the first item that matches the given criteria and appears before this Tag in the document. |
findPreviousSibling([name, attrs, text]) |
Returns the closest sibling to this Tag that matches the given criteria and appears before this Tag in the document. |
findPreviousSiblings([name, attrs, text, limit]) |
Returns the siblings of this Tag that match the given criteria and appear before this Tag in the document. |
find_all([name, attrs, recursive, text, limit]) |
Extracts a list of Tag objects that match the given criteria. |
find_all_next([name, attrs, text, limit]) |
Returns all items that match the given criteria and appear after this Tag in the document. |
find_all_previous([name, attrs, text, limit]) |
Returns all items that match the given criteria and appear before this Tag in the document. |
find_next([name, attrs, text]) |
Returns the first item that matches the given criteria and appears after this Tag in the document. |
find_next_sibling([name, attrs, text]) |
Returns the closest sibling to this Tag that matches the given criteria and appears after this Tag in the document. |
find_next_siblings([name, attrs, text, limit]) |
Returns the siblings of this Tag that match the given criteria and appear after this Tag in the document. |
find_parent([name, attrs]) |
Returns the closest parent of this Tag that matches the given criteria. |
find_parents([name, attrs, limit]) |
Returns the parents of this Tag that match the given criteria. |
find_previous([name, attrs, text]) |
Returns the first item that matches the given criteria and appears before this Tag in the document. |
find_previous_sibling([name, attrs, text]) |
Returns the closest sibling to this Tag that matches the given criteria and appears before this Tag in the document. |
find_previous_siblings([name, attrs, text, ...]) |
Returns the siblings of this Tag that match the given criteria and appear before this Tag in the document. |
format_string(s[, formatter]) |
Format the given string using the given formatter. |
get(key[, default]) |
Returns the value of the ‘key’ attribute for the tag, or the value given for ‘default’ if it doesn’t have that attribute. |
getText([separator, strip, types]) |
Get all child strings, concatenated using the given separator. |
get_text([separator, strip, types]) |
Get all child strings, concatenated using the given separator. |
handle_data(data) |
|
handle_endtag(name[, nsprefix]) |
|
handle_starttag(name, namespace, nsprefix, attrs) |
Push a start tag on to the stack. |
has_attr(key) |
|
has_key(key) |
This was kind of misleading because has_key() (attributes) was different from __in__ (contents). |
index(element) |
Find the index of a child by identity, not value. |
insert(position, new_child) |
|
insert_after(successor) |
|
insert_before(successor) |
|
new_string(s[, subclass]) |
Create a new NavigableString associated with this soup. |
new_tag(name[, namespace, nsprefix]) |
Create a new tag associated with this soup. |
nextGenerator() |
|
nextSiblingGenerator() |
|
object_was_parsed(o[, parent, ...]) |
Add an object to the parse tree. |
parentGenerator() |
|
popTag() |
|
prettify([encoding, formatter]) |
|
previousGenerator() |
|
previousSiblingGenerator() |
|
pushTag(tag) |
|
recursiveChildGenerator() |
|
renderContents([encoding, prettyPrint, ...]) |
|
replaceWith(replace_with) |
|
replaceWithChildren() |
|
replace_with(replace_with) |
|
replace_with_children() |
|
reset() |
|
select(selector[, _candidate_generator, limit]) |
Perform a CSS selection operation on the current element. |
select_one(selector) |
Perform a CSS selection operation on the current element. |
setup([parent, previous_element, ...]) |
Sets up the initial relations between this element and other elements. |
unwrap() |
|
wrap(wrap_inside) |
Attributes¶
ASCII_SPACES |
|
DEFAULT_BUILDER_FEATURES |
|
HTML_FORMATTERS |
|
NO_PARSER_SPECIFIED_WARNING |
|
ROOT_TAG_NAME |
|
XML_FORMATTERS |
|
attribselect_re |
|
children |
|
descendants |
|
isSelfClosing |
Is this tag an empty-element tag? (aka a self-closing tag) |
is_empty_element |
Is this tag an empty-element tag? (aka a self-closing tag) |
next |
|
nextSibling |
|
next_elements |
|
next_siblings |
|
parents |
|
parserClass |
|
previous |
|
previousSibling |
|
previous_elements |
|
previous_siblings |
|
string |
Convenience property to get the single string within this tag. |
strings |
Yield all strings of certain classes, possibly stripping them. |
stripped_strings |
|
tag_name_re |
|
text |
Get all child strings, concatenated using the given separator. |