gensim.models.LdaMulticore
¶
-
class
gensim.models.
LdaMulticore
(corpus=None, num_topics=100, id2word=None, workers=None, chunksize=2000, passes=1, batch=False, alpha='symmetric', eta=None, decay=0.5, offset=1.0, eval_every=10, iterations=50, gamma_threshold=0.001, random_state=None)[source]¶ The constructor estimates Latent Dirichlet Allocation model parameters based on a training corpus:
>>> lda = LdaMulticore(corpus, num_topics=10)
You can then infer topic distributions on new, unseen documents, with
>>> doc_lda = lda[doc_bow]
The model can be updated (trained) with new documents via
>>> lda.update(other_corpus)
Model persistency is achieved through its load/save methods.
Methods¶
__init__ ([corpus, num_topics, id2word, ...]) |
If given, start training from the iterable corpus straight away. |
bound (corpus[, gamma, subsample_ratio]) |
Estimate the variational bound of documents from corpus: |
clear () |
Clear model state (free up some memory). |
do_estep (chunk[, state]) |
Perform inference on a chunk of documents, and accumulate the collected sufficient statistics in state (or self.state if None). |
do_mstep (rho, other[, extra_pass]) |
M step: use linear interpolation between the existing topics and collected sufficient statistics in other to update the topics. |
get_document_topics (bow[, ...]) |
Return topic distribution for the given document bow, as a list of (topic_id, topic_probability) 2-tuples. |
get_term_topics (word_id[, minimum_probability]) |
Returns most likely topics for a particular word in vocab. |
get_topic_terms (topicid[, topn]) |
Return a list of (word_id, probability) 2-tuples for the most probable words in topic topicid. |
inference (chunk[, collect_sstats]) |
Given a chunk of sparse document vectors, estimate gamma (parameters controlling the topic weights) for each document in the chunk. |
init_dir_prior (prior, name) |
|
load (fname, *args, **kwargs) |
Load a previously saved object from file (also see save). |
log_perplexity (chunk[, total_docs]) |
Calculate and return per-word likelihood bound, using the chunk of documents as evaluation corpus. |
print_topic (topicid[, topn]) |
Return the result of show_topic, but formatted as a single string. |
print_topics ([num_topics, num_words]) |
|
save (fname[, ignore]) |
Save the model to file. |
show_topic (topicid[, topn]) |
Return a list of (word, probability) 2-tuples for the most probable words in topic topicid. |
show_topics ([num_topics, num_words, log, ...]) |
For num_topics number of topics, return num_words most significant words (10 words per topic, by default). |
sync_state () |
|
top_topics (corpus[, num_words]) |
Calculate the Umass topic coherence for each topic. |
update (corpus[, chunks_as_numpy]) |
Train the model with new documents, by EM-iterating over corpus until the topics converge (or until the maximum number of allowed iterations is reached). |
update_alpha (gammat, rho) |
Update parameters for the Dirichlet prior on the per-document topic weights alpha given the last gammat. |
update_eta (lambdat, rho) |
Update parameters for the Dirichlet prior on the per-topic word weights eta given the last lambdat. |