gensim.corpora.WikiCorpus.get_texts¶
-
WikiCorpus.
get_texts
()[source]¶ Iterate over the dump, returning text version of each article as a list of tokens.
Only articles of sufficient length are returned (short articles & redirects etc are ignored).
Note that this iterates over the texts; if you want vectors, just use the standard corpus interface instead of this function:
>>> for vec in wiki_corpus: >>> print(vec)