# gensim.matutils¶

This module contains math helper functions.

## Functions¶

 any2sparse(vec[, eps]) Convert a numpy/scipy vector into gensim document format (=list of 2-tuples). argsort(x[, topn, reverse]) Return indices of the topn smallest elements in array x, in ascending order. blas(name, ndarray) corpus2csc(corpus[, num_terms, dtype, ...]) Convert a streamed corpus into a sparse matrix, in scipy.sparse.csc_matrix format, with documents as columns. corpus2dense(corpus, num_terms[, num_docs, ...]) Convert corpus into a dense numpy array (documents will be columns). cossim(vec1, vec2) Return cosine similarity between two sparse vectors. dense2vec(vec[, eps]) Convert a dense numpy array into the sparse document format (sequence of 2-tuples). entropy(pk[, qk, base]) Calculate the entropy of a distribution for given probability values. full2sparse(vec[, eps]) Convert a dense numpy array into the sparse document format (sequence of 2-tuples). full2sparse_clipped(vec, topn[, eps]) Like full2sparse, but only return the topn elements of the greatest magnitude (abs). get_lapack_funcs(names[, arrays, dtype]) Return available LAPACK function objects from names. hellinger(vec1, vec2) Hellinger distance is a distance metric to quantify the similarity between two probability distributions. isbow(vec) Checks if vector passed is in bag of words representation or not. ismatrix(m) iteritems(d, **kw) Return an iterator over the (key, value) pairs of a dictionary. itervalues(d, **kw) Return an iterator over the values of a dictionary. jaccard(vec1, vec2) A distance metric between bags of words representation. kullback_leibler(vec1, vec2[, num_features]) A distance metric between two probability distributions. pad(mat, padrow, padcol) Add additional rows/columns to a numpy.matrix mat. qr_destroy(la) Return QR decomposition of la[0]. ret_normalized_vec(vec, length) scipy2sparse(vec[, eps]) Convert a scipy.sparse vector into gensim document format (=list of 2-tuples). sparse2full(doc, length) Convert a document in sparse document format (=sequence of 2-tuples) into a dense numpy array (of size length). triu(m[, k]) Make a copy of a matrix with elements below the k-th diagonal zeroed. triu_indices(n[, k, m]) Return the indices for the upper-triangle of an (n, m) array. unitvec(vec[, norm]) Scale a vector to unit length. veclen(vec) zeros_aligned(shape, dtype[, order, align]) Like numpy.zeros(), but the array will be aligned at align byte boundary.

## Classes¶

 Dense2Corpus(dense[, documents_columns]) Treat dense numpy array as a sparse, streamed gensim corpus. MmReader(input[, transposed]) Wrap a term-document matrix on disk (in matrix-market format), and present it as an object which supports iteration over the rows (~documents). MmWriter(fname) Store a corpus in Matrix Market format. Scipy2Corpus(vecs) Convert a sequence of dense/sparse vectors into a streamed gensim corpus object. Sparse2Corpus(sparse[, documents_columns]) Convert a matrix in scipy.sparse format into a streaming gensim corpus. izip izip(iter1 [,iter2 [...]]) –> izip object xrange xrange(start, stop[, step]) -> xrange object