`gensim.corpora.LowCorpus`¶

class gensim.corpora.LowCorpus(fname, id2word=None, line2words=<function split_on_space>)[source]¶

List_Of_Words corpus handles input in GibbsLda++ format.

Quoting http://gibbslda.sourceforge.net/#3.2_Input_Data_Format:

Both data for training/estimating the model and new data (i.e., previously
unseen data) have the same format as follows:

[M]
[document1]
[document2]
...
[documentM]

in which the first line is the total number for documents [M]. Each line
after that is one document. [documenti] is the ith document of the dataset
that consists of a list of Ni words/terms.

[documenti] = [wordi1] [wordi2] ... [wordiNi]

in which all [wordij] (i=1..M, j=1..Ni) are text strings and they are separated
by the blank character.

Methods¶

`__init__`(fname[, id2word, line2words])	Initialize the corpus from a file.
`docbyoffset`(offset)	Return the document stored at file position offset.
`line2doc`(line)
`load`(fname[, mmap])	Load a previously saved object from file (also see save).
`save`(args, *kwargs)
`save_corpus`(fname, corpus[, id2word, metadata])	Save a corpus in the List-of-words format.
`serialize`(serializer, fname, corpus[, ...])	Iterate through the document stream corpus, saving the documents to fname and recording byte offset of each document.

Attributes¶