gensim.similarities.WmdSimilarity

class gensim.similarities.WmdSimilarity(corpus, w2v_model, num_best=None, normalize_w2v_and_replace=True, chunksize=256)[source]

Document similarity (like MatrixSimilarity) that uses the negative of WMD as a similarity measure. See gensim.models.word2vec.wmdistance for more information.

When a num_best value is provided, only the most similar documents are retrieved.

When using this code, please consider citing the following papers:

Example:

# See Tutorial Notebook for more examples https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/WMD_tutorial.ipynb >>> # Given a document collection “corpus”, train word2vec model. >>> model = word2vec(corpus) >>> instance = WmdSimilarity(corpus, model, num_best=10)

>>> # Make query.
>>> query = 'Very good, you should seat outdoor.'
>>> sims = instance[query]

Methods

__init__(corpus, w2v_model[, num_best, ...]) corpus: List of lists of strings, as in gensim.models.word2vec.
get_similarities(query) Do not use this function directly; use the self[query] syntax instead.
load(fname[, mmap]) Load a previously saved object from file (also see save).
save(fname_or_handle[, separately, ...]) Save the object to file (also see load).