gensim.utils.decode_htmlentities()
¶
-
gensim.utils.
decode_htmlentities
(text)[source]¶ Decode HTML entities in text, coded as hex, decimal or named.
Adapted from http://github.com/sku/python-twitter-ircbot/blob/321d94e0e40d0acc92f5bf57d126b57369da70de/html_decode.py
>>> u = u'E tu vivrai nel terrore - L'aldilà (1981)' >>> print(decode_htmlentities(u).encode('UTF-8')) E tu vivrai nel terrore - L'aldilà (1981) >>> print(decode_htmlentities("l'eau")) l'eau >>> print(decode_htmlentities("foo < bar")) foo < bar