gensim.corpora.BleiCorpus
¶
-
class
gensim.corpora.
BleiCorpus
(fname, fname_vocab=None)[source]¶ Corpus in Blei’s LDA-C format.
The corpus is represented as two files: one describing the documents, and another describing the mapping between words and their ids.
Each document is one line:
N fieldId1:fieldValue1 fieldId2:fieldValue2 ... fieldIdN:fieldValueN
The vocabulary is a file with words, one word per line; word at line K has an implicit
id=K
.
Methods¶
__init__ (fname[, fname_vocab]) |
Initialize the corpus from a file. |
docbyoffset (offset) |
Return the document stored at file position offset. |
line2doc (line) |
|
load (fname[, mmap]) |
Load a previously saved object from file (also see save). |
save (*args, **kwargs) |
|
save_corpus (fname, corpus[, id2word, metadata]) |
Save a corpus in the LDA-C format. |
serialize (serializer, fname, corpus[, ...]) |
Iterate through the document stream corpus, saving the documents to fname and recording byte offset of each document. |