gensim.similarities.SparseMatrixSimilarity

class gensim.similarities.SparseMatrixSimilarity(corpus, num_features=None, num_terms=None, num_docs=None, num_nnz=None, num_best=None, chunksize=500, dtype=<type 'numpy.float32'>, maintain_sparsity=False)[source]

Compute similarity against a corpus of documents by storing the sparse index matrix in memory. The similarity measure used is cosine between two vectors.

Use this if your input corpus contains sparse vectors (such as documents in bag-of-words format) and fits into RAM.

The matrix is internally stored as a scipy.sparse.csr matrix. Unless the entire matrix fits into main memory, use Similarity instead.

Takes an optional maintain_sparsity argument, setting this to True causes get_similarities to return a sparse matrix instead of a dense representation if possible.

See also Similarity and MatrixSimilarity in this module.

Methods

__init__(corpus[, num_features, num_terms, ...])
get_similarities(query) Return similarity of sparse vector query to all documents in the corpus, as a numpy array.
load(fname[, mmap]) Load a previously saved object from file (also see save).
save(fname_or_handle[, separately, ...]) Save the object to file (also see load).