Extremely bold.. _kind_of_obj attributes: Attributes ^^^^^^^^^^ A tag may have any number of attributes. The tag ```` has an attribute "class" whose value is "boldest". You can access a tag's attributes by treating the tag like a dictionary:: tag['class'] # u'boldest' You can access that dictionary directly as ``.attrs``:: tag.attrs # {u'class': u'boldest'} You can add, remove, and modify a tag's attributes. Again, this is done by treating the tag as a dictionary:: tag['class'] = 'verybold' tag['id'] = 1 tag #
Extremely bolddel tag['class'] del tag['id'] tag #
Extremely boldtag['class'] # KeyError: 'class' print(tag.get('class')) # None .. _multivalue: Multi-valued attributes &&&&&&&&&&&&&&&&&&&&&&& HTML 4 defines a few attributes that can have multiple values. HTML 5 removes a couple of them, but defines a few more. The most common multi-valued attribute is ``class`` (that is, a tag can have more than one CSS class). Others include ``rel``, ``rev``, ``accept-charset``, ``headers``, and ``accesskey``. Beautiful Soup presents the value(s) of a multi-valued attribute as a list:: css_soup = BeautifulSoup('') css_soup.p['class'] # ["body", "strikeout"] css_soup = BeautifulSoup('') css_soup.p['class'] # ["body"] If an attribute `looks` like it has more than one value, but it's not a multi-valued attribute as defined by any version of the HTML standard, Beautiful Soup will leave the attribute alone:: id_soup = BeautifulSoup('') id_soup.p['id'] # 'my id' When you turn a tag back into a string, multiple attribute values are consolidated:: rel_soup = BeautifulSoup('
Back to the homepage
') rel_soup.a['rel'] # ['index'] rel_soup.a['rel'] = ['index', 'contents'] print(rel_soup.p) #Back to the homepage
If you parse a document as XML, there are no multi-valued attributes:: xml_soup = BeautifulSoup('', 'xml') xml_soup.p['class'] # u'body strikeout' ``NavigableString`` ------------------- A string corresponds to a bit of text within a tag. Beautiful Soup uses the ``NavigableString`` class to contain these bits of text:: tag.string # u'Extremely bold' type(tag.string) #No longer bold``NavigableString`` supports most of the features described in :ref:`navigating_the_tree` and :ref:`searching_the_tree`, but not all of them. In particular, since a string can't contain anything (the way a tag may contain a string or another tag), strings don't support the ``.contents`` or ``.string`` attributes, or the ``find()`` method. If you want to use a ``NavigableString`` outside of Beautiful Soup, you should call ``unicode()`` on it to turn it into a normal Python Unicode string. If you don't, your string will carry around a reference to the entire Beautiful Soup parse tree, even when you're done using Beautiful Soup. This is a big waste of memory. ``BeautifulSoup`` ----------------- The ``BeautifulSoup`` object itself represents the document as a whole. For most purposes, you can treat it as a :ref:`Tag` object. This means it supports most of the methods described in :ref:`navigating_the_tree` and :ref:`searching_the_tree`. Since the ``BeautifulSoup`` object doesn't correspond to an actual HTML or XML tag, it has no name and no attributes. But sometimes it's useful to look at its ``.name``, so it's been given the special ``.name`` "[document]":: soup.name # u'[document]' Comments and other special strings ---------------------------------- ``Tag``, ``NavigableString``, and ``BeautifulSoup`` cover almost everything you'll see in an HTML or XML file, but there are a few leftover bits. The only one you'll probably ever need to worry about is the comment:: markup = "" soup = BeautifulSoup(markup) comment = soup.b.string type(comment) #