bs4.BeautifulSoup¶
-
class
bs4.
BeautifulSoup
(markup='', features=None, builder=None, parse_only=None, from_encoding=None, exclude_encodings=None, **kwargs)[source]¶ This class defines the basic interface called by the tree builders.
- These methods will be called by the parser:
- reset() feed(markup)
- The tree builder may call these methods from its feed() implementation:
- handle_starttag(name, attrs) # See note about return value handle_endtag(name) handle_data(data) # Appends to the current data node endData(containerClass=NavigableString) # Ends the current data node
No matter how complicated the underlying parser is, you should be able to build a tree using ‘start tag’ events, ‘end tag’ events, ‘data’ events, and “done with data” events.
If you encounter an empty-element tag (aka a self-closing tag, like HTML’s <br> tag), call handle_starttag and then handle_endtag.
Methods¶
__init__ ([markup, features, builder, ...]) |
The Soup object is initialized as the ‘root tag’, and the provided markup (which can be a string or a file-like object) is fed into the underlying parser. |
append (tag) |
Appends the given tag to the contents of this tag. |
childGenerator () |
|
clear ([decompose]) |
Extract all children. |
decode ([pretty_print, eventual_encoding, ...]) |
Returns a string or Unicode representation of this document. |
decode_contents ([indent_level, ...]) |
Renders the contents of this tag as a Unicode string. |
decompose () |
Recursively destroys the contents of this tree. |
encode ([encoding, indent_level, formatter, ...]) |
|
encode_contents ([indent_level, encoding, ...]) |
Renders the contents of this tag as a bytestring. |
endData ([containerClass]) |
|
extract () |
Destructively rips this element out of the tree. |
fetchNextSiblings ([name, attrs, text, limit]) |
Returns the siblings of this Tag that match the given criteria and appear after this Tag in the document. |
fetchParents ([name, attrs, limit]) |
Returns the parents of this Tag that match the given criteria. |
fetchPrevious ([name, attrs, text, limit]) |
Returns all items that match the given criteria and appear before this Tag in the document. |
fetchPreviousSiblings ([name, attrs, text, limit]) |
Returns the siblings of this Tag that match the given criteria and appear before this Tag in the document. |
find ([name, attrs, recursive, text]) |
Return only the first child of this Tag matching the given criteria. |
findAll ([name, attrs, recursive, text, limit]) |
Extracts a list of Tag objects that match the given criteria. |
findAllNext ([name, attrs, text, limit]) |
Returns all items that match the given criteria and appear after this Tag in the document. |
findAllPrevious ([name, attrs, text, limit]) |
Returns all items that match the given criteria and appear before this Tag in the document. |
findChild ([name, attrs, recursive, text]) |
Return only the first child of this Tag matching the given criteria. |
findChildren ([name, attrs, recursive, text, ...]) |
Extracts a list of Tag objects that match the given criteria. |
findNext ([name, attrs, text]) |
Returns the first item that matches the given criteria and appears after this Tag in the document. |
findNextSibling ([name, attrs, text]) |
Returns the closest sibling to this Tag that matches the given criteria and appears after this Tag in the document. |
findNextSiblings ([name, attrs, text, limit]) |
Returns the siblings of this Tag that match the given criteria and appear after this Tag in the document. |
findParent ([name, attrs]) |
Returns the closest parent of this Tag that matches the given criteria. |
findParents ([name, attrs, limit]) |
Returns the parents of this Tag that match the given criteria. |
findPrevious ([name, attrs, text]) |
Returns the first item that matches the given criteria and appears before this Tag in the document. |
findPreviousSibling ([name, attrs, text]) |
Returns the closest sibling to this Tag that matches the given criteria and appears before this Tag in the document. |
findPreviousSiblings ([name, attrs, text, limit]) |
Returns the siblings of this Tag that match the given criteria and appear before this Tag in the document. |
find_all ([name, attrs, recursive, text, limit]) |
Extracts a list of Tag objects that match the given criteria. |
find_all_next ([name, attrs, text, limit]) |
Returns all items that match the given criteria and appear after this Tag in the document. |
find_all_previous ([name, attrs, text, limit]) |
Returns all items that match the given criteria and appear before this Tag in the document. |
find_next ([name, attrs, text]) |
Returns the first item that matches the given criteria and appears after this Tag in the document. |
find_next_sibling ([name, attrs, text]) |
Returns the closest sibling to this Tag that matches the given criteria and appears after this Tag in the document. |
find_next_siblings ([name, attrs, text, limit]) |
Returns the siblings of this Tag that match the given criteria and appear after this Tag in the document. |
find_parent ([name, attrs]) |
Returns the closest parent of this Tag that matches the given criteria. |
find_parents ([name, attrs, limit]) |
Returns the parents of this Tag that match the given criteria. |
find_previous ([name, attrs, text]) |
Returns the first item that matches the given criteria and appears before this Tag in the document. |
find_previous_sibling ([name, attrs, text]) |
Returns the closest sibling to this Tag that matches the given criteria and appears before this Tag in the document. |
find_previous_siblings ([name, attrs, text, ...]) |
Returns the siblings of this Tag that match the given criteria and appear before this Tag in the document. |
format_string (s[, formatter]) |
Format the given string using the given formatter. |
get (key[, default]) |
Returns the value of the ‘key’ attribute for the tag, or the value given for ‘default’ if it doesn’t have that attribute. |
getText ([separator, strip, types]) |
Get all child strings, concatenated using the given separator. |
get_text ([separator, strip, types]) |
Get all child strings, concatenated using the given separator. |
handle_data (data) |
|
handle_endtag (name[, nsprefix]) |
|
handle_starttag (name, namespace, nsprefix, attrs) |
Push a start tag on to the stack. |
has_attr (key) |
|
has_key (key) |
This was kind of misleading because has_key() (attributes) was different from __in__ (contents). |
index (element) |
Find the index of a child by identity, not value. |
insert (position, new_child) |
|
insert_after (successor) |
|
insert_before (successor) |
|
new_string (s[, subclass]) |
Create a new NavigableString associated with this soup. |
new_tag (name[, namespace, nsprefix]) |
Create a new tag associated with this soup. |
nextGenerator () |
|
nextSiblingGenerator () |
|
object_was_parsed (o[, parent, ...]) |
Add an object to the parse tree. |
parentGenerator () |
|
popTag () |
|
prettify ([encoding, formatter]) |
|
previousGenerator () |
|
previousSiblingGenerator () |
|
pushTag (tag) |
|
recursiveChildGenerator () |
|
renderContents ([encoding, prettyPrint, ...]) |
|
replaceWith (replace_with) |
|
replaceWithChildren () |
|
replace_with (replace_with) |
|
replace_with_children () |
|
reset () |
|
select (selector[, _candidate_generator, limit]) |
Perform a CSS selection operation on the current element. |
select_one (selector) |
Perform a CSS selection operation on the current element. |
setup ([parent, previous_element, ...]) |
Sets up the initial relations between this element and other elements. |
unwrap () |
|
wrap (wrap_inside) |
Attributes¶
ASCII_SPACES |
|
DEFAULT_BUILDER_FEATURES |
|
HTML_FORMATTERS |
|
NO_PARSER_SPECIFIED_WARNING |
|
ROOT_TAG_NAME |
|
XML_FORMATTERS |
|
attribselect_re |
|
children |
|
descendants |
|
isSelfClosing |
Is this tag an empty-element tag? (aka a self-closing tag) |
is_empty_element |
Is this tag an empty-element tag? (aka a self-closing tag) |
next |
|
nextSibling |
|
next_elements |
|
next_siblings |
|
parents |
|
parserClass |
|
previous |
|
previousSibling |
|
previous_elements |
|
previous_siblings |
|
string |
Convenience property to get the single string within this tag. |
strings |
Yield all strings of certain classes, possibly stripping them. |
stripped_strings |
|
tag_name_re |
|
text |
Get all child strings, concatenated using the given separator. |