gensim.corpora.IndexedCorpus.__init__

IndexedCorpus.__init__(fname, index_fname=None)[source]

Initialize this abstract base class, by loading a previously saved index from index_fname (or fname.index if index_fname is not set). This index will allow subclasses to support the corpus[docno] syntax (random access to document #`docno` in O(1)).

>>> # save corpus in SvmLightCorpus format with an index
>>> corpus = [[(1, 0.5)], [(0, 1.0), (1, 2.0)]]
>>> gensim.corpora.SvmLightCorpus.serialize('testfile.svmlight', corpus)
>>> # load back as a document stream (*not* plain Python list)
>>> corpus_with_random_access = gensim.corpora.SvmLightCorpus('tstfile.svmlight')
>>> print(corpus_with_random_access[1])
[(0, 1.0), (1, 2.0)]