gensim.corpora.IndexedCorpus.__init__¶
-
IndexedCorpus.
__init__
(fname, index_fname=None)[source]¶ Initialize this abstract base class, by loading a previously saved index from index_fname (or fname.index if index_fname is not set). This index will allow subclasses to support the corpus[docno] syntax (random access to document #`docno` in O(1)).
>>> # save corpus in SvmLightCorpus format with an index >>> corpus = [[(1, 0.5)], [(0, 1.0), (1, 2.0)]] >>> gensim.corpora.SvmLightCorpus.serialize('testfile.svmlight', corpus) >>> # load back as a document stream (*not* plain Python list) >>> corpus_with_random_access = gensim.corpora.SvmLightCorpus('tstfile.svmlight') >>> print(corpus_with_random_access[1]) [(0, 1.0), (1, 2.0)]