nltk.guess_encoding()

nltk.guess_encoding(data)[source]

Given a byte string, attempt to decode it. Tries the standard ‘UTF8’ and ‘latin-1’ encodings, Plus several gathered from locale information.

The calling program must first call:

locale.setlocale(locale.LC_ALL, '')

If successful it returns (decoded_unicode, successful_encoding). If unsuccessful it raises a UnicodeError.