`gensim.models.LdaModel`¶

class gensim.models.LdaModel(corpus=None, num_topics=100, id2word=None, distributed=False, chunksize=2000, passes=1, update_every=1, alpha='symmetric', eta=None, decay=0.5, offset=1.0, eval_every=10, iterations=50, gamma_threshold=0.001, minimum_probability=0.01, random_state=None, ns_conf={})[source]¶

The constructor estimates Latent Dirichlet Allocation model parameters based on a training corpus:

>>> lda = LdaModel(corpus, num_topics=10)

You can then infer topic distributions on new, unseen documents, with

>>> doc_lda = lda[doc_bow]

The model can be updated (trained) with new documents via

>>> lda.update(other_corpus)

Model persistency is achieved through its load/save methods.

Methods¶

`__init__`([corpus, num_topics, id2word, ...])	If given, start training from the iterable corpus straight away.
`bound`(corpus[, gamma, subsample_ratio])	Estimate the variational bound of documents from corpus:
`clear`()	Clear model state (free up some memory).
`do_estep`(chunk[, state])	Perform inference on a chunk of documents, and accumulate the collected sufficient statistics in state (or self.state if None).
`do_mstep`(rho, other[, extra_pass])	M step: use linear interpolation between the existing topics and collected sufficient statistics in other to update the topics.
`get_document_topics`(bow[, ...])	Return topic distribution for the given document bow, as a list of (topic_id, topic_probability) 2-tuples.
`get_term_topics`(word_id[, minimum_probability])	Returns most likely topics for a particular word in vocab.
`get_topic_terms`(topicid[, topn])	Return a list of (word_id, probability) 2-tuples for the most probable words in topic topicid.
`inference`(chunk[, collect_sstats])	Given a chunk of sparse document vectors, estimate gamma (parameters controlling the topic weights) for each document in the chunk.
`init_dir_prior`(prior, name)
`load`(fname, args, *kwargs)	Load a previously saved object from file (also see save).
`log_perplexity`(chunk[, total_docs])	Calculate and return per-word likelihood bound, using the chunk of documents as evaluation corpus.
`print_topic`(topicid[, topn])	Return the result of show_topic, but formatted as a single string.
`print_topics`([num_topics, num_words])
`save`(fname[, ignore])	Save the model to file.
`show_topic`(topicid[, topn])	Return a list of (word, probability) 2-tuples for the most probable words in topic topicid.
`show_topics`([num_topics, num_words, log, ...])	For num_topics number of topics, return num_words most significant words (10 words per topic, by default).
`sync_state`()
`top_topics`(corpus[, num_words])	Calculate the Umass topic coherence for each topic.
`update`(corpus[, chunksize, decay, offset, ...])	Train the model with new documents, by EM-iterating over corpus until the topics converge (or until the maximum number of allowed iterations is reached).
`update_alpha`(gammat, rho)	Update parameters for the Dirichlet prior on the per-document topic weights alpha given the last gammat.
`update_eta`(lambdat, rho)	Update parameters for the Dirichlet prior on the per-topic word weights eta given the last lambdat.