gensim.corpora.TextCorpus

class gensim.corpora.TextCorpus(input=None)[source]

Helper class to simplify the pipeline of getting bag-of-words vectors (= a gensim corpus) from plain text.

This is an abstract base class: override the get_texts() and __len__() methods to match your particular input.

Given a filename (or a file-like object) in constructor, the corpus object will be automatically initialized with a dictionary in self.dictionary and will support the iter corpus method. You must only provide a correct get_texts implementation.

Methods

__init__([input])
get_texts() Iterate over the collection, yielding one document at a time.
getstream()
load(fname[, mmap]) Load a previously saved object from file (also see save).
save(*args, **kwargs)
save_corpus(fname, corpus[, id2word, metadata]) Save an existing corpus to disk.