gensim.corpora.Dictionary.add_documents¶
-
Dictionary.
add_documents
(documents, prune_at=2000000)[source]¶ Update dictionary from a collection of documents. Each document is a list of tokens = tokenized and normalized strings (either utf8 or unicode).
This is a convenience wrapper for calling doc2bow on each document with allow_update=True, which also prunes infrequent words, keeping the total number of unique words <= prune_at. This is to save memory on very large inputs. To disable this pruning, set prune_at=None.
>>> print(Dictionary(["máma mele maso".split(), "ema má máma".split()])) Dictionary(5 unique tokens)