gensim.corpora.Dictionary.add_documents¶

Dictionary.add_documents(documents, prune_at=2000000)[source]¶

Update dictionary from a collection of documents. Each document is a list of tokens = tokenized and normalized strings (either utf8 or unicode).

This is a convenience wrapper for calling doc2bow on each document with allow_update=True, which also prunes infrequent words, keeping the total number of unique words <= prune_at. This is to save memory on very large inputs. To disable this pruning, set prune_at=None.

>>>>>> print(Dictionary(["máma mele maso".split(), "ema má máma".split()]))
Dictionary(5 unique tokens)