gensim.corpora.HashDictionary

class gensim.corpora.HashDictionary(documents=None, id_range=32000, myhash=<built-in function adler32>, debug=True)[source]

HashDictionary encapsulates the mapping between normalized words and their integer ids.

Unlike Dictionary, building a HashDictionary before using it is not a necessary step. The documents can be computed immediately, from an uninitialized HashDictionary, without seeing the rest of the corpus first.

The main function is doc2bow, which converts a collection of words to its bag-of-words representation: a list of (word_id, word_frequency) 2-tuples.

Methods

__init__([documents, id_range, myhash, debug]) By default, keep track of debug statistics and mappings.
add_documents(documents) Build dictionary from a collection of documents.
clear(() -> None.  Remove all items from D.)
copy(() -> a shallow copy of D)
doc2bow(document[, allow_update, return_missing]) Convert document (a list of words) into the bag-of-words format = list of (token_id, token_count) 2-tuples.
filter_extremes([no_below, no_above, keep_n]) Remove document frequency statistics for tokens that appear in
from_documents(*args, **kwargs)
fromkeys(...) v defaults to None.
get((k[,d]) -> D[k] if k in D, ...)
has_key((k) -> True if D has a key k, else False)
items(() -> list of D’s (key, value) pairs, ...)
iteritems(() -> an iterator over the (key, ...)
iterkeys(() -> an iterator over the keys of D)
itervalues(...)
keys() Return a list of all token ids.
load(fname[, mmap]) Load a previously saved object from file (also see save).
pop((k[,d]) -> v, ...) If key is not found, d is returned if given, otherwise KeyError is raised
popitem(() -> (k, v), ...) 2-tuple; but raise KeyError if D is empty.
restricted_hash(token) Calculate id of the given token.
save(fname_or_handle[, separately, ...]) Save the object to file (also see load).
save_as_text(fname) Save this HashDictionary to a text file, for easier debugging.
setdefault((k[,d]) -> D.get(k,d), ...)
update(([E, ...) If E present and has a .keys() method, does: for k in E: D[k] = E[k]
values(() -> list of D’s values)
viewitems(...)
viewkeys(...)
viewvalues(...)