bs4.UnicodeDammit.detwingle

classmethod UnicodeDammit.detwingle(in_bytes, main_encoding='utf8', embedded_encoding='windows-1252')[source]

Fix characters from one encoding embedded in some other encoding.

Currently the only situation supported is Windows-1252 (or its subset ISO-8859-1), embedded in UTF-8.

The input must be a bytestring. If you’ve already converted the document to Unicode, you’re too late.

The output is a bytestring in which embedded_encoding characters have been converted to their main_encoding equivalents.