gensim.corpora.TextCorpus.get_texts

TextCorpus.get_texts()[source]

Iterate over the collection, yielding one document at a time. A document is a sequence of words (strings) that can be fed into Dictionary.doc2bow.

Override this function to match your input (parse input files, do any text preprocessing, lowercasing, tokenizing etc.). There will be no further preprocessing of the words coming out of this function.