gensim.matutils

This module contains math helper functions.

Functions

any2sparse(vec[, eps]) Convert a numpy/scipy vector into gensim document format (=list of 2-tuples).
argsort(x[, topn, reverse]) Return indices of the topn smallest elements in array x, in ascending order.
blas(name, ndarray)
corpus2csc(corpus[, num_terms, dtype, ...]) Convert a streamed corpus into a sparse matrix, in scipy.sparse.csc_matrix format, with documents as columns.
corpus2dense(corpus, num_terms[, num_docs, ...]) Convert corpus into a dense numpy array (documents will be columns).
cossim(vec1, vec2) Return cosine similarity between two sparse vectors.
dense2vec(vec[, eps]) Convert a dense numpy array into the sparse document format (sequence of 2-tuples).
entropy(pk[, qk, base]) Calculate the entropy of a distribution for given probability values.
full2sparse(vec[, eps]) Convert a dense numpy array into the sparse document format (sequence of 2-tuples).
full2sparse_clipped(vec, topn[, eps]) Like full2sparse, but only return the topn elements of the greatest magnitude (abs).
get_lapack_funcs(names[, arrays, dtype]) Return available LAPACK function objects from names.
hellinger(vec1, vec2) Hellinger distance is a distance metric to quantify the similarity between two probability distributions.
isbow(vec) Checks if vector passed is in bag of words representation or not.
ismatrix(m)
iteritems(d, **kw) Return an iterator over the (key, value) pairs of a dictionary.
itervalues(d, **kw) Return an iterator over the values of a dictionary.
jaccard(vec1, vec2) A distance metric between bags of words representation.
kullback_leibler(vec1, vec2[, num_features]) A distance metric between two probability distributions.
pad(mat, padrow, padcol) Add additional rows/columns to a numpy.matrix mat.
qr_destroy(la) Return QR decomposition of la[0].
ret_normalized_vec(vec, length)
scipy2sparse(vec[, eps]) Convert a scipy.sparse vector into gensim document format (=list of 2-tuples).
sparse2full(doc, length) Convert a document in sparse document format (=sequence of 2-tuples) into a dense numpy array (of size length).
triu(m[, k]) Make a copy of a matrix with elements below the k-th diagonal zeroed.
triu_indices(n[, k, m]) Return the indices for the upper-triangle of an (n, m) array.
unitvec(vec[, norm]) Scale a vector to unit length.
veclen(vec)
zeros_aligned(shape, dtype[, order, align]) Like numpy.zeros(), but the array will be aligned at align byte boundary.

Classes

Dense2Corpus(dense[, documents_columns]) Treat dense numpy array as a sparse, streamed gensim corpus.
MmReader(input[, transposed]) Wrap a term-document matrix on disk (in matrix-market format), and present it as an object which supports iteration over the rows (~documents).
MmWriter(fname) Store a corpus in Matrix Market format.
Scipy2Corpus(vecs) Convert a sequence of dense/sparse vectors into a streamed gensim corpus object.
Sparse2Corpus(sparse[, documents_columns]) Convert a matrix in scipy.sparse format into a streaming gensim corpus.
izip izip(iter1 [,iter2 [...]]) –> izip object
xrange xrange(start, stop[, step]) -> xrange object