gensim.models.LdaModel.update¶

LdaModel.update(corpus, chunksize=None, decay=None, offset=None, passes=None, update_every=None, eval_every=None, iterations=None, gamma_threshold=None, chunks_as_numpy=False)[source]¶

Train the model with new documents, by EM-iterating over corpus until the topics converge (or until the maximum number of allowed iterations is reached). corpus must be an iterable (repeatable stream of documents),

In distributed mode, the E step is distributed over a cluster of machines.

This update also supports updating an already trained model (self) with new documents from corpus; the two models are then merged in proportion to the number of old vs. new documents. This feature is still experimental for non-stationary input streams.

For stationary input (no topic drift in new documents), on the other hand, this equals the online update of Hoffman et al. and is guaranteed to converge for any decay in (0.5, 1.0>. Additionally, for smaller corpus sizes, an increasing offset may be beneficial (see Table 1 in Hoffman et al.)

Args:

corpus (gensim corpus): The corpus with which the LDA model should be updated.

chunks_as_numpy (bool): Whether each chunk passed to .inference should be a numpy: array of not. Numpy can in some settings turn the term IDs into floats, these will be converted back into integers in inference, which incurs a performance hit. For distributed computing it may be desirable to keep the chunks as numpy arrays.

For other parameter settings, see LdaModel constructor.