gensim.models.CoherenceModel.init¶

CoherenceModel.__init__(model=None, topics=None, texts=None, corpus=None, dictionary=None, window_size=None, coherence='c_v', topn=10)[source]¶

model : Pre-trained topic model. Should be provided if topics is not provided. topics : List of tokenized topics. If this is preferred over model, dictionary should be provided.

eg. topics = [[‘human’, ‘machine’, ‘computer’, ‘interface’],

[‘graph’, ‘trees’, ‘binary’, ‘widths’]]

texts : Tokenized texts. Needed for coherence models that use sliding window based probability estimator. corpus : Gensim document corpus. dictionary : Gensim dictionary mapping of id word to create corpus. If model.id2word is present, this is not needed.

If both are provided, dictionary will be used.

window_size : Is the size of the window to be used for coherence measures using boolean sliding window as their: probability estimator. For ‘u_mass’ this doesn’t matter. If left ‘None’ the default window sizes are used which are: ‘c_v’ : 110 ‘c_uci’ : 10 ‘c_npmi’ : 10
coherence : Coherence measure to be used. Supported values are:: ‘u_mass’ ‘c_v’ ‘c_uci’ also popularly known as c_pmi ‘c_npmi’ For ‘u_mass’ corpus should be provided. If texts is provided, it will be converted to corpus using the dictionary. For ‘c_v’, ‘c_uci’ and ‘c_npmi’ texts should be provided. Corpus is not needed.

topn : Integer corresponding to the number of top words to be extracted from each topic.

gensim.models.CoherenceModel.__init__¶

gensim.models.CoherenceModel.init¶