gensim.matutils.MmWriter
¶
-
class
gensim.matutils.
MmWriter
(fname)[source]¶ Store a corpus in Matrix Market format.
Note that the output is written one document at a time, not the whole matrix at once (unlike scipy.io.mmread). This allows us to process corpora which are larger than the available RAM.
NOTE: the output file is created in a single pass through the input corpus, so that the input can be a once-only stream (iterator). To achieve this, a fake MM header is written first, statistics are collected during the pass (shape of the matrix, number of non-zeroes), followed by a seek back to the beginning of the file, rewriting the fake header with proper values.
Methods¶
__init__ (fname) |
|
close () |
|
fake_headers (num_docs, num_terms, num_nnz) |
|
write_corpus (fname, corpus[, progress_cnt, ...]) |
Save the vector space representation of an entire corpus to disk. |
write_headers (num_docs, num_terms, num_nnz) |
|
write_vector (docno, vector) |
Write a single sparse vector to the file. |
Attributes¶
HEADER_LINE |