gensim.models.LogEntropyModel

class gensim.models.LogEntropyModel(corpus, id2word=None, normalize=True)[source]

Objects of this class realize the transformation between word-document co-occurence matrix (integers) into a locally/globally weighted matrix (positive floats).

This is done by a log entropy normalization, optionally normalizing the resulting documents to unit length. The following formulas explain how to compute the log entropy weight for term i in document j:

local_weight_{i,j} = log(frequency_{i,j} + 1)

P_{i,j} = frequency_{i,j} / sum_j frequency_{i,j}

                      sum_j P_{i,j} * log(P_{i,j})
global_weight_i = 1 + ----------------------------
                      log(number_of_documents + 1)

final_weight_{i,j} = local_weight_{i,j} * global_weight_i

The main methods are:

  1. constructor, which calculates the global weighting for all terms in

    a corpus.

  2. the [] method, which transforms a simple count representation into the

    log entropy normalized space.

>>> log_ent = LogEntropyModel(corpus)
>>> print(log_ent[some_doc])
>>> log_ent.save('/tmp/foo.log_ent_model')

Model persistency is achieved via its load/save methods.

Methods

__init__(corpus[, id2word, normalize]) normalize dictates whether the resulting vectors will be
initialize(corpus) Initialize internal statistics based on a training corpus.
load(fname[, mmap]) Load a previously saved object from file (also see save).
save(fname_or_handle[, separately, ...]) Save the object to file (also see load).