gensim.corpora.Dictionary
¶
-
class
gensim.corpora.
Dictionary
(documents=None, prune_at=2000000)[source]¶ Dictionary encapsulates the mapping between normalized words and their integer ids.
The main function is doc2bow, which converts a collection of words to its bag-of-words representation: a list of (word_id, word_frequency) 2-tuples.
Methods¶
__init__ ([documents, prune_at]) |
If documents are given, use them to initialize Dictionary (see add_documents()). |
add_documents (documents[, prune_at]) |
Update dictionary from a collection of documents. |
compactify () |
Assign new word ids to all words. |
doc2bow (document[, allow_update, return_missing]) |
Convert document (a list of words) into the bag-of-words format = list of (token_id, token_count) 2-tuples. |
filter_extremes ([no_below, no_above, keep_n]) |
Filter out tokens that appear in |
filter_n_most_frequent (remove_n) |
Filter out the ‘remove_n’ most frequent tokens that appear in the documents. |
filter_tokens ([bad_ids, good_ids]) |
Remove the selected bad_ids tokens from all dictionary mappings, or, keep selected good_ids in the mapping and remove the rest. |
from_corpus (corpus[, id2word]) |
Create Dictionary from an existing corpus. |
from_documents (documents) |
|
get ((k[,d]) -> D[k] if k in D, ...) |
|
items (() -> list of D’s (key, value) pairs, ...) |
|
iteritems (() -> an iterator over the (key, ...) |
|
iterkeys (() -> an iterator over the keys of D) |
|
itervalues (...) |
|
keys () |
Return a list of all token ids. |
load (fname[, mmap]) |
Load a previously saved object from file (also see save). |
load_from_text (fname) |
Load a previously stored Dictionary from a text file. |
merge_with (other) |
Merge another dictionary into this dictionary, mapping same tokens to the same ids and new tokens to new ids. |
save (fname_or_handle[, separately, ...]) |
Save the object to file (also see load). |
save_as_text (fname[, sort_by_word]) |
Save this Dictionary to a text file, in format: id[TAB]word_utf8[TAB]document frequency[NEWLINE]. |
values (() -> list of D’s values) |