gensim.models.LsiModel.__init__

LsiModel.__init__(corpus=None, num_topics=200, id2word=None, chunksize=20000, decay=1.0, distributed=False, onepass=True, power_iters=2, extra_samples=100)[source]

num_topics is the number of requested factors (latent dimensions).

After the model has been trained, you can estimate topics for an arbitrary, unseen document, using the topics = self[document] dictionary notation. You can also add new training documents, with self.add_documents, so that training can be stopped and resumed at any time, and the LSI transformation is available at any point.

If you specify a corpus, it will be used to train the model. See the method add_documents for a description of the chunksize and decay parameters.

Turn onepass off to force a multi-pass stochastic algorithm.

power_iters and extra_samples affect the accuracy of the stochastic multi-pass algorithm, which is used either internally (onepass=True) or as the front-end algorithm (onepass=False). Increasing the number of power iterations improves accuracy, but lowers performance. See [R7] for some hard numbers.

Turn on distributed to enable distributed computing.

Example:

>>> lsi = LsiModel(corpus, num_topics=10)
>>> print(lsi[doc_tfidf]) # project some document into LSI space
>>> lsi.add_documents(corpus2) # update LSI on additional documents
>>> print(lsi[doc_tfidf])
[R7]http://nlp.fi.muni.cz/~xrehurek/nips/rehurek_nips.pdf