gensim.corpora.WikiCorpus.__init__

WikiCorpus.__init__(fname, processes=None, lemmatize=True, dictionary=None, filter_namespaces=('0', ))[source]

Initialize the corpus. Unless a dictionary is provided, this scans the corpus once, to determine its vocabulary.

If pattern package is installed, use fancier shallow parsing to get token lemmas. Otherwise, use simple regexp tokenization. You can override this automatic logic by forcing the lemmatize parameter explicitly.