gensim.similarities.Similarity

class gensim.similarities.Similarity(output_prefix, corpus, num_features, num_best=None, chunksize=256, shardsize=32768, norm='l2')[source]

Compute cosine similarity of a dynamic query against a static corpus of documents (“the index”).

Scalability is achieved by sharding the index into smaller pieces, each of which fits into core memory (see the (Sparse)MatrixSimilarity classes in this module). The shards themselves are simply stored as files to disk and mmap’ed back as needed.

Methods

__init__(output_prefix, corpus, num_features) Construct the index from corpus.
add_documents(corpus) Extend the index with new documents.
check_moved() Update shard locations, in case the server directory has moved on filesystem.
close_shard() Force the latest shard to close (be converted to a matrix and stored to disk).
destroy() Delete all files under self.output_prefix.
get_similarities(doc)
iter_chunks([chunksize]) Iteratively yield the index as chunks of documents, each of size <= chunksize.
load(fname[, mmap]) Load a previously saved object from file (also see save).
query_shards(query) Return the result of applying shard[query] for each shard in self.shards, as a sequence.
reopen_shard()
save([fname]) Save the object via pickling (also see load) under filename specified in the constructor.
shardid2filename(shardid)
similarity_by_id(docpos) Return similarity of the given document only.
vector_by_id(docpos) Return indexed vector corresponding to the document at position docpos.