gensim.matutils.MmReader

class gensim.matutils.MmReader(input, transposed=True)[source]

Wrap a term-document matrix on disk (in matrix-market format), and present it as an object which supports iteration over the rows (~documents).

Note that the file is read into memory one document at a time, not the whole matrix at once (unlike scipy.io.mmread). This allows us to process corpora which are larger than the available RAM.

Methods

__init__(input[, transposed]) Initialize the matrix reader.
docbyoffset(offset) Return document at file offset offset (in bytes)
skip_headers(input_file) Skip file headers that appear before the first document.