gensim.matutils.MmWriter

class gensim.matutils.MmWriter(fname)[source]

Store a corpus in Matrix Market format.

Note that the output is written one document at a time, not the whole matrix at once (unlike scipy.io.mmread). This allows us to process corpora which are larger than the available RAM.

NOTE: the output file is created in a single pass through the input corpus, so that the input can be a once-only stream (iterator). To achieve this, a fake MM header is written first, statistics are collected during the pass (shape of the matrix, number of non-zeroes), followed by a seek back to the beginning of the file, rewriting the fake header with proper values.

Methods

__init__(fname)
close()
fake_headers(num_docs, num_terms, num_nnz)
write_corpus(fname, corpus[, progress_cnt, ...]) Save the vector space representation of an entire corpus to disk.
write_headers(num_docs, num_terms, num_nnz)
write_vector(docno, vector) Write a single sparse vector to the file.

Attributes

HEADER_LINE