class simserver.SimServer(basename, use_locks=True)[source]

Top-level functionality for similarity services. A similarity server takes care of:

  1. creating semantic models
  2. indexing documents using these models
  3. finding the most similar documents in an index.

An object of this class can be shared across network via Pyro, to answer remote client requests. It is thread safe. Using a server concurrently from multiple processes is safe for reading = answering similarity queries. Modifying (training/indexing) is realized via locking = serialized internally.


__init__(basename[, use_locks]) All data will be stored under directory basename.
buffer(*args, **kwargs) Add a sequence of documents to be processed (indexed or trained on).
close() Explicitly close open file handles, databases etc.
delete(*args, **kwargs) Delete specified documents from the index.
drop_index(*args, **kwargs) Drop all indexed documents.
find_similar(doc[, min_score, max_results]) Find at most max_results most similar articles in the index, each having similarity score of at least min_score.
flush(*args, **kwargs) Commit all changes, clear all caches.
index(*args, **kwargs) Permanently index all documents previously added via buffer, or directly index documents from corpus, if specified.
keys() Return ids of all indexed documents.
optimize(*args, **kwargs) Precompute top similarities for all indexed documents.
train(*args, **kwargs) Create an indexing model.