gensim.models.LsiModel

class gensim.models.LsiModel(corpus=None, num_topics=200, id2word=None, chunksize=20000, decay=1.0, distributed=False, onepass=True, power_iters=2, extra_samples=100)[source]

Objects of this class allow building and maintaining a model for Latent Semantic Indexing (also known as Latent Semantic Analysis).

The main methods are:

  1. constructor, which initializes the projection into latent topics space,
  2. the [] method, which returns representation of any input document in the latent space,
  3. add_documents() for incrementally updating the model with new documents.

The left singular vectors are stored in lsi.projection.u, singular values in lsi.projection.s. Right singular vectors can be reconstructed from the output of lsi[training_corpus], if needed. See also FAQ [R4].

Model persistency is achieved via its load/save methods.

[R4]https://github.com/piskvorky/gensim/wiki/Recipes-&-FAQ#q4-how-do-you-output-the-u-s-vt-matrices-of-lsi

Methods

__init__([corpus, num_topics, id2word, ...]) num_topics is the number of requested factors (latent dimensions).
add_documents(corpus[, chunksize, decay]) Update singular value decomposition to take into account a new corpus of documents.
load(fname, *args, **kwargs) Load a previously saved object from file (also see save).
print_debug([num_topics, num_words]) Print (to log) the most salient words of the first num_topics topics.
print_topic(topicno[, topn]) Return a single topic as a formatted string.
print_topics([num_topics, num_words]) Alias for show_topics() which prints the top 5 topics to log.
save(fname, *args, **kwargs) Save the model to file.
show_topic(topicno[, topn]) Return a specified topic (=left singular vector), 0 <= topicno < self.num_topics, as a string.
show_topics([num_topics, num_words, log, ...]) Return num_topics most significant topics (return all by default).