gensim.models.LogEntropyModel
¶
-
class
gensim.models.
LogEntropyModel
(corpus, id2word=None, normalize=True)[source]¶ Objects of this class realize the transformation between word-document co-occurence matrix (integers) into a locally/globally weighted matrix (positive floats).
This is done by a log entropy normalization, optionally normalizing the resulting documents to unit length. The following formulas explain how to compute the log entropy weight for term i in document j:
local_weight_{i,j} = log(frequency_{i,j} + 1) P_{i,j} = frequency_{i,j} / sum_j frequency_{i,j} sum_j P_{i,j} * log(P_{i,j}) global_weight_i = 1 + ---------------------------- log(number_of_documents + 1) final_weight_{i,j} = local_weight_{i,j} * global_weight_i
The main methods are:
- constructor, which calculates the global weighting for all terms in
a corpus.
- the [] method, which transforms a simple count representation into the
log entropy normalized space.
>>> log_ent = LogEntropyModel(corpus) >>> print(log_ent[some_doc]) >>> log_ent.save('/tmp/foo.log_ent_model')
Model persistency is achieved via its load/save methods.
Methods¶
__init__ (corpus[, id2word, normalize]) |
normalize dictates whether the resulting vectors will be |
initialize (corpus) |
Initialize internal statistics based on a training corpus. |
load (fname[, mmap]) |
Load a previously saved object from file (also see save). |
save (fname_or_handle[, separately, ...]) |
Save the object to file (also see load). |