`gensim.utils.simple_preprocess()`¶

gensim.utils.simple_preprocess(doc, deacc=False, min_len=2, max_len=15)[source]¶

Convert a document into a list of tokens.

This lowercases, tokenizes, de-accents (optional). – the output are final tokens = unicode strings, that won’t be processed any further.