gensim.corpora.TextCorpus
¶
-
class
gensim.corpora.
TextCorpus
(input=None)[source]¶ Helper class to simplify the pipeline of getting bag-of-words vectors (= a gensim corpus) from plain text.
This is an abstract base class: override the get_texts() and __len__() methods to match your particular input.
Given a filename (or a file-like object) in constructor, the corpus object will be automatically initialized with a dictionary in self.dictionary and will support the iter corpus method. You must only provide a correct get_texts implementation.