any2unicode (text[, encoding, errors]) |
Convert a string (bytestring in encoding or unicode), to unicode. |
any2utf8 (text[, errors, encoding]) |
Convert a string (unicode or bytestring in encoding), to bytestring in utf8. |
check_output (*popenargs, **kwargs) |
Run command with arguments and return its output as a byte string. |
chunkize (corpus, chunksize[, maxsize, as_numpy]) |
Split a stream of values into smaller chunks. |
chunkize_serial (iterable, chunksize[, as_numpy]) |
Return elements from the iterable in chunksize-ed lists. |
contextmanager (func) |
@contextmanager decorator. |
copytree_hardlink (source, dest) |
Recursively copy a directory ala shutils.copytree, but hardlink files instead of copying. |
deaccent (text) |
Remove accentuation from the given string. |
decode_htmlentities (text) |
Decode HTML entities in text, coded as hex, decimal or named. |
dict_from_corpus (corpus) |
Scan corpus for all word ids that appear in it, then construct and return a mapping which maps each wordId -> str(wordId). |
file_or_filename (*args, **kwds) |
Return a file-like object ready to be read from the beginning. |
getNS ([host, port, broadcast, hmac_key]) |
Return a Pyro name server proxy. |
get_max_id (corpus) |
Return the highest feature id that appears in the corpus. |
get_my_ip () |
Try to obtain our external ip (from the pyro nameserver’s point of view) |
grouper (iterable, chunksize[, as_numpy]) |
Return elements from the iterable in chunksize-ed lists. |
has_pattern () |
Function to check if there is installed pattern library |
identity (p) |
Identity fnc, for flows that don’t accept lambda (pickling etc). |
is_corpus (obj) |
Check whether obj is a corpus. |
iteritems (d, **kw) |
Return an iterator over the (key, value) pairs of a dictionary. |
keep_vocab_item (word, count, min_count[, ...]) |
|
lemmatize (content[, allowed_tags, light, ...]) |
This function is only available when the optional ‘pattern’ package is installed. |
mock_data ([n_items, dim, prob_nnz, lam]) |
Create a random gensim-style corpus, as a list of lists of (int, float) tuples, to be used as a mock corpus. |
mock_data_row ([dim, prob_nnz, lam]) |
Create a random gensim sparse vector. |
pickle (obj, fname[, protocol]) |
Pickle object obj to file fname. |
prune_vocab (vocab, min_reduce[, trim_rule]) |
Remove all entries from the vocab dictionary with count smaller than min_reduce. |
pyro_daemon (name, obj[, random_suffix, ip, ...]) |
Register object with name server (starting the name server if not running yet) and block until the daemon is terminated. |
qsize (queue) |
Return the (approximate) queue size where available; -1 where not (OS X). |
randfname ([prefix]) |
|
revdict (d) |
Reverse a dictionary mapping. |
safe_unichr (intval) |
|
simple_preprocess (doc[, deacc, min_len, max_len]) |
Convert a document into a list of tokens. |
smart_extension (fname, ext) |
|
smart_open (uri[, mode]) |
Open the given S3 / HDFS / filesystem file pointed to by uri for reading or writing. |
synchronous (tlockname) |
A decorator to place an instance-based lock around a method. |
to_unicode (text[, encoding, errors]) |
Convert a string (bytestring in encoding or unicode), to unicode. |
to_utf8 (text[, errors, encoding]) |
Convert a string (unicode or bytestring in encoding), to bytestring in utf8. |
tokenize (text[, lowercase, deacc, errors, ...]) |
Iteratively yield tokens as unicode strings, removing accent marks and optionally lowercasing the unidoce string by assigning True to one of the parameters, lowercase, to_lower, or lower. |
toptexts (query, texts, index[, n]) |
Debug fnc to help inspect the top n most similar documents (according to a similarity index index), to see if they are actually related to the query. |
u (s) |
Text literal |
unichr ((i) -> Unicode character) |
Return a Unicode string of one character with ordinal i; 0 <= i <= 0x10ffff. |
unpickle (fname) |
Load pickled object from fname |
upload_chunked (server, docs[, chunksize, ...]) |
Memory-friendly upload of documents to a SimServer (or Pyro SimServer proxy). |
wraps (wrapped[, assigned, updated]) |
Decorator factory to apply update_wrapper() to a wrapper function |