gensim.models.TfidfModel

class gensim.models.TfidfModel(corpus=None, id2word=None, dictionary=None, wlocal=<function identity>, wglobal=<function df2idf>, normalize=True)[source]

Objects of this class realize the transformation between word-document co-occurrence matrix (integers) into a locally/globally weighted TF_IDF matrix (positive floats).

The main methods are:

  1. constructor, which calculates inverse document counts for all terms in the training corpus.
  2. the [] method, which transforms a simple count representation into the TfIdf space.
>>> tfidf = TfidfModel(corpus)
>>> print(tfidf[some_doc])
>>> tfidf.save('/tmp/foo.tfidf_model')

Model persistency is achieved via its load/save methods.

Methods

__init__([corpus, id2word, dictionary, ...]) Compute tf-idf by multiplying a local component (term frequency) with a global component (inverse document frequency), and normalizing the resulting documents to unit length.
initialize(corpus) Compute inverse document weights, which will be used to modify term frequencies for documents.
load(fname[, mmap]) Load a previously saved object from file (also see save).
save(fname_or_handle[, separately, ...]) Save the object to file (also see load).