gensim.models.Doc2Vec

class gensim.models.Doc2Vec(documents=None, size=300, alpha=0.025, window=8, min_count=5, max_vocab_size=None, sample=0, seed=1, workers=1, min_alpha=0.0001, dm=1, hs=1, negative=0, dbow_words=0, dm_mean=0, dm_concat=0, dm_tag_count=1, docvecs=None, docvecs_mapfile=None, comment=None, trim_rule=None, **kwargs)[source]

Class for training, using and evaluating neural networks described in http://arxiv.org/pdf/1405.4053v2.pdf

Methods

__init__([documents, size, alpha, window, ...]) Initialize the model from an iterable of documents.
accuracy(questions[, restrict_vocab, ...]) Compute accuracy of the model.
build_vocab(sentences[, keep_raw_vocab, ...]) Build vocabulary from a sequence of sentences (can be a once-only generator stream).
clear_sims()
create_binary_tree() Create a binary Huffman tree using stored vocabulary word counts.
doesnt_match(words) Which word from the given list doesn’t go with the others?
estimate_memory([vocab_size, report]) Estimate required memory for a model using current settings.
finalize_vocab() Build tables and model weights based on final vocabulary settings.
infer_vector(doc_words[, alpha, min_alpha, ...]) Infer a vector for given post-bulk training document.
init_sims([replace]) Precompute L2-normalized vectors.
intersect_word2vec_format(fname[, lockf, ...]) Merge the input-hidden weight matrix from the original C word2vec-tool format given, where it intersects with the current vocabulary.
load(*args, **kwargs)
load_word2vec_format(fname[, fvocab, ...]) Load the input-hidden weight matrix from the original C word2vec-tool format.
log_accuracy(section)
make_cum_table([power, domain]) Create a cumulative-distribution table using stored vocabulary word counts for drawing random words in the negative-sampling training routines.
most_similar([positive, negative, topn, ...]) Find the top-N most similar words.
most_similar_cosmul([positive, negative, topn]) Find the top-N most similar words, using the multiplicative combination objective proposed by Omer Levy and Yoav Goldberg in [R4].
n_similarity(ws1, ws2) Compute cosine similarity between two sets of words.
reset_from(other_model) Reuse shareable structures from other_model.
reset_weights()
save(*args, **kwargs) Save the object to file (also see load).
save_word2vec_format(fname[, fvocab, binary]) Store the input-hidden weight matrix in the same format used by the original C word2vec-tool, for compatibility.
scale_vocab([min_count, sample, dry_run, ...]) Apply vocabulary settings for min_count (discarding less-frequent words) and sample (controlling the downsampling of more-frequent words).
scan_vocab(documents[, progress_per, trim_rule])
score(sentences[, total_sentences, ...]) Score the log probability for a sequence of sentences (can be a once-only generator stream).
seeded_vector(seed_string) Create one ‘random’ vector (but deterministic by seed_string)
similar_by_vector(vector[, topn, restrict_vocab]) Find the top-N most similar words by vector.
similar_by_word(word[, topn, restrict_vocab]) Find the top-N most similar words.
similarity(w1, w2) Compute cosine similarity between two words.
sort_vocab() Sort the vocabulary so the most frequent words have the lowest indexes.
train(sentences[, total_words, word_count, ...]) Update the model’s neural weights from a sequence of sentences (can be a once-only generator stream).
wmdistance(document1, document2) Compute the Word Mover’s Distance between two documents.

Attributes

dbow
dm