`gensim.corpora.Dictionary`¶

class gensim.corpora.Dictionary(documents=None, prune_at=2000000)[source]¶

Dictionary encapsulates the mapping between normalized words and their integer ids.

The main function is doc2bow, which converts a collection of words to its bag-of-words representation: a list of (word_id, word_frequency) 2-tuples.

Methods¶

`__init__`([documents, prune_at])	If documents are given, use them to initialize Dictionary (see add_documents()).
`add_documents`(documents[, prune_at])	Update dictionary from a collection of documents.
`compactify`()	Assign new word ids to all words.
`doc2bow`(document[, allow_update, return_missing])	Convert document (a list of words) into the bag-of-words format = list of (token_id, token_count) 2-tuples.
`filter_extremes`([no_below, no_above, keep_n])	Filter out tokens that appear in
`filter_n_most_frequent`(remove_n)	Filter out the ‘remove_n’ most frequent tokens that appear in the documents.
`filter_tokens`([bad_ids, good_ids])	Remove the selected bad_ids tokens from all dictionary mappings, or, keep selected good_ids in the mapping and remove the rest.
`from_corpus`(corpus[, id2word])	Create Dictionary from an existing corpus.
`from_documents`(documents)
`get`((k[,d]) -> D[k] if k in D, ...)
`items`(() -> list of D’s (key, value) pairs, ...)
`iteritems`(() -> an iterator over the (key, ...)
`iterkeys`(() -> an iterator over the keys of D)
`itervalues`(...)
`keys`()	Return a list of all token ids.
`load`(fname[, mmap])	Load a previously saved object from file (also see save).
`load_from_text`(fname)	Load a previously stored Dictionary from a text file.
`merge_with`(other)	Merge another dictionary into this dictionary, mapping same tokens to the same ids and new tokens to new ids.
`save`(fname_or_handle[, separately, ...])	Save the object to file (also see load).
`save_as_text`(fname[, sort_by_word])	Save this Dictionary to a text file, in format: id[TAB]word_utf8[TAB]document frequency[NEWLINE].
`values`(() -> list of D’s values)