bs4.UnicodeDammit

class bs4.UnicodeDammit(markup, override_encodings=[], smart_quotes_to=None, is_html=False, exclude_encodings=[])[source]

A class for detecting the encoding of a *ML document and converting it to a Unicode string. If the source encoding is windows-1252, can replace MS smart quotes with their HTML or XML equivalents.

__init__(markup, override_encodings=[], smart_quotes_to=None, is_html=False, exclude_encodings=[])[source]

Methods

__init__(markup[, override_encodings, ...])
detwingle(in_bytes[, main_encoding, ...]) Fix characters from one encoding embedded in some other encoding.
find_codec(charset)