gensim.models.LdaModel

class gensim.models.LdaModel(corpus=None, num_topics=100, id2word=None, distributed=False, chunksize=2000, passes=1, update_every=1, alpha='symmetric', eta=None, decay=0.5, offset=1.0, eval_every=10, iterations=50, gamma_threshold=0.001, minimum_probability=0.01, random_state=None, ns_conf={})[source]

The constructor estimates Latent Dirichlet Allocation model parameters based on a training corpus:

>>> lda = LdaModel(corpus, num_topics=10)

You can then infer topic distributions on new, unseen documents, with

>>> doc_lda = lda[doc_bow]

The model can be updated (trained) with new documents via

>>> lda.update(other_corpus)

Model persistency is achieved through its load/save methods.

Methods

__init__([corpus, num_topics, id2word, ...]) If given, start training from the iterable corpus straight away.
bound(corpus[, gamma, subsample_ratio]) Estimate the variational bound of documents from corpus:
clear() Clear model state (free up some memory).
do_estep(chunk[, state]) Perform inference on a chunk of documents, and accumulate the collected sufficient statistics in state (or self.state if None).
do_mstep(rho, other[, extra_pass]) M step: use linear interpolation between the existing topics and collected sufficient statistics in other to update the topics.
get_document_topics(bow[, ...]) Return topic distribution for the given document bow, as a list of (topic_id, topic_probability) 2-tuples.
get_term_topics(word_id[, minimum_probability]) Returns most likely topics for a particular word in vocab.
get_topic_terms(topicid[, topn]) Return a list of (word_id, probability) 2-tuples for the most probable words in topic topicid.
inference(chunk[, collect_sstats]) Given a chunk of sparse document vectors, estimate gamma (parameters controlling the topic weights) for each document in the chunk.
init_dir_prior(prior, name)
load(fname, *args, **kwargs) Load a previously saved object from file (also see save).
log_perplexity(chunk[, total_docs]) Calculate and return per-word likelihood bound, using the chunk of documents as evaluation corpus.
print_topic(topicid[, topn]) Return the result of show_topic, but formatted as a single string.
print_topics([num_topics, num_words])
save(fname[, ignore]) Save the model to file.
show_topic(topicid[, topn]) Return a list of (word, probability) 2-tuples for the most probable words in topic topicid.
show_topics([num_topics, num_words, log, ...]) For num_topics number of topics, return num_words most significant words (10 words per topic, by default).
sync_state()
top_topics(corpus[, num_words]) Calculate the Umass topic coherence for each topic.
update(corpus[, chunksize, decay, offset, ...]) Train the model with new documents, by EM-iterating over corpus until the topics converge (or until the maximum number of allowed iterations is reached).
update_alpha(gammat, rho) Update parameters for the Dirichlet prior on the per-document topic weights alpha given the last gammat.
update_eta(lambdat, rho) Update parameters for the Dirichlet prior on the per-topic word weights eta given the last lambdat.