`gensim.corpora.BleiCorpus`¶

class gensim.corpora.BleiCorpus(fname, fname_vocab=None)[source]¶

Corpus in Blei’s LDA-C format.

The corpus is represented as two files: one describing the documents, and another describing the mapping between words and their ids.

Each document is one line:

N fieldId1:fieldValue1 fieldId2:fieldValue2 ... fieldIdN:fieldValueN

The vocabulary is a file with words, one word per line; word at line K has an implicit id=K.

Methods¶

`__init__`(fname[, fname_vocab])	Initialize the corpus from a file.
`docbyoffset`(offset)	Return the document stored at file position offset.
`line2doc`(line)
`load`(fname[, mmap])	Load a previously saved object from file (also see save).
`save`(args, *kwargs)
`save_corpus`(fname, corpus[, id2word, metadata])	Save a corpus in the LDA-C format.
`serialize`(serializer, fname, corpus[, ...])	Iterate through the document stream corpus, saving the documents to fname and recording byte offset of each document.