.. _quick_start: Quick Start =========== Here's an HTML document I'll be using as an example throughout this document. It's part of a story from `Alice in Wonderland`:: html_doc = """
The Dormouse's story
Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.
...
""" Running the "three sisters" document through Beautiful Soup gives us a ``BeautifulSoup`` object, which represents the document as a nested data structure:: from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc) print(soup.prettify()) # # ## # The Dormouse's story # #
## Once upon a time there were three little sisters; and their names were # # Elsie # # , # # Lacie # # and # # Tillie # # ; and they lived at the bottom of a well. #
## ... #
# # Here are some simple ways to navigate that data structure:: soup.title #The Dormouse's story
soup.p['class'] # u'title' soup.a # Elsie soup.find_all('a') # [Elsie, # Lacie, # Tillie] soup.find(id="link3") # Tillie One common task is extracting all the URLs found within a page's tags:: for link in soup.find_all('a'): print(link.get('href')) # http://example.com/elsie # http://example.com/lacie # http://example.com/tillie Another common task is extracting all the text from a page:: print(soup.get_text()) # The Dormouse's story # # The Dormouse's story # # Once upon a time there were three little sisters; and their names were # Elsie, # Lacie and # Tillie; # and they lived at the bottom of a well. # # ... Does this look like what you need? If so, read on.