gensim.corpora.WikiCorpus.get_texts¶

WikiCorpus.get_texts()[source]¶

Iterate over the dump, returning text version of each article as a list of tokens.

Only articles of sufficient length are returned (short articles & redirects etc are ignored).

Note that this iterates over the texts; if you want vectors, just use the standard corpus interface instead of this function:

>>> for vec in wiki_corpus:
>>>     print(vec)