gensim.corpora.HashDictionary
¶
-
class
gensim.corpora.
HashDictionary
(documents=None, id_range=32000, myhash=<built-in function adler32>, debug=True)[source]¶ HashDictionary encapsulates the mapping between normalized words and their integer ids.
Unlike Dictionary, building a HashDictionary before using it is not a necessary step. The documents can be computed immediately, from an uninitialized HashDictionary, without seeing the rest of the corpus first.
The main function is doc2bow, which converts a collection of words to its bag-of-words representation: a list of (word_id, word_frequency) 2-tuples.
Methods¶
__init__ ([documents, id_range, myhash, debug]) |
By default, keep track of debug statistics and mappings. |
add_documents (documents) |
Build dictionary from a collection of documents. |
clear (() -> None. Remove all items from D.) |
|
copy (() -> a shallow copy of D) |
|
doc2bow (document[, allow_update, return_missing]) |
Convert document (a list of words) into the bag-of-words format = list of (token_id, token_count) 2-tuples. |
filter_extremes ([no_below, no_above, keep_n]) |
Remove document frequency statistics for tokens that appear in |
from_documents (*args, **kwargs) |
|
fromkeys (...) |
v defaults to None. |
get ((k[,d]) -> D[k] if k in D, ...) |
|
has_key ((k) -> True if D has a key k, else False) |
|
items (() -> list of D’s (key, value) pairs, ...) |
|
iteritems (() -> an iterator over the (key, ...) |
|
iterkeys (() -> an iterator over the keys of D) |
|
itervalues (...) |
|
keys () |
Return a list of all token ids. |
load (fname[, mmap]) |
Load a previously saved object from file (also see save). |
pop ((k[,d]) -> v, ...) |
If key is not found, d is returned if given, otherwise KeyError is raised |
popitem (() -> (k, v), ...) |
2-tuple; but raise KeyError if D is empty. |
restricted_hash (token) |
Calculate id of the given token. |
save (fname_or_handle[, separately, ...]) |
Save the object to file (also see load). |
save_as_text (fname) |
Save this HashDictionary to a text file, for easier debugging. |
setdefault ((k[,d]) -> D.get(k,d), ...) |
|
update (([E, ...) |
If E present and has a .keys() method, does: for k in E: D[k] = E[k] |
values (() -> list of D’s values) |
|
viewitems (...) |
|
viewkeys (...) |
|
viewvalues (...) |