gensim.models.LsiModel.__init__¶
-
LsiModel.
__init__
(corpus=None, num_topics=200, id2word=None, chunksize=20000, decay=1.0, distributed=False, onepass=True, power_iters=2, extra_samples=100)[source]¶ num_topics is the number of requested factors (latent dimensions).
After the model has been trained, you can estimate topics for an arbitrary, unseen document, using the
topics = self[document]
dictionary notation. You can also add new training documents, withself.add_documents
, so that training can be stopped and resumed at any time, and the LSI transformation is available at any point.If you specify a corpus, it will be used to train the model. See the method add_documents for a description of the chunksize and decay parameters.
Turn onepass off to force a multi-pass stochastic algorithm.
power_iters and extra_samples affect the accuracy of the stochastic multi-pass algorithm, which is used either internally (onepass=True) or as the front-end algorithm (onepass=False). Increasing the number of power iterations improves accuracy, but lowers performance. See [R7] for some hard numbers.
Turn on distributed to enable distributed computing.
Example:
>>> lsi = LsiModel(corpus, num_topics=10) >>> print(lsi[doc_tfidf]) # project some document into LSI space >>> lsi.add_documents(corpus2) # update LSI on additional documents >>> print(lsi[doc_tfidf])
[R7] http://nlp.fi.muni.cz/~xrehurek/nips/rehurek_nips.pdf