Beautiful Soup Documentation¶
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
These instructions illustrate all major features of Beautiful Soup 4, with examples. I show you what the library is good for, how it works, how to use it, how to make it do what you want, and what to do when it violates your expectations.
The examples in this documentation should work the same way in Python 2.7 and Python 3.2.
You might be looking for the documentation for Beautiful Soup 3. If so, you should know that Beautiful Soup 3 is no longer being developed, and that Beautiful Soup 4 is recommended for all new projects. If you want to learn about the differences between Beautiful Soup 3 and Beautiful Soup 4, see Porting code to BS4.
Getting help¶
If you have questions about Beautiful Soup, or run into problems, send mail to the discussion group. If your problem involves parsing an HTML document, be sure to mention what the diagnose() function says about that document.
- 1. Quick Start
- 2. Installing Beautiful Soup
- 3. Making the soup
- 4. Kinds of objects
- 5. Navigating the tree
- 6. Searching the tree
- 6.1. Kinds of filters
- 6.2.
find_all()
- 6.3. Calling a tag is like calling
find_all()
- 6.4.
find()
- 6.5.
find_parents()
andfind_parent()
- 6.6.
find_next_siblings()
andfind_next_sibling()
- 6.7.
find_previous_siblings()
andfind_previous_sibling()
- 6.8.
find_all_next()
andfind_next()
- 6.9.
find_all_previous()
andfind_previous()
- 6.10. CSS selectors
- 7. Modifying the tree
- 8. Output
- 9. Specifying the parser to use
- 10. Encodings
- 11. Parsing only part of a document
- 12. Troubleshooting
- 13. Beautiful Soup 3