`gensim.models.Doc2Vec`¶

class gensim.models.Doc2Vec(documents=None, size=300, alpha=0.025, window=8, min_count=5, max_vocab_size=None, sample=0, seed=1, workers=1, min_alpha=0.0001, dm=1, hs=1, negative=0, dbow_words=0, dm_mean=0, dm_concat=0, dm_tag_count=1, docvecs=None, docvecs_mapfile=None, comment=None, trim_rule=None, **kwargs)[source]¶: Class for training, using and evaluating neural networks described in http://arxiv.org/pdf/1405.4053v2.pdf

Methods¶

`__init__`([documents, size, alpha, window, ...])	Initialize the model from an iterable of documents.
`accuracy`(questions[, restrict_vocab, ...])	Compute accuracy of the model.
`build_vocab`(sentences[, keep_raw_vocab, ...])	Build vocabulary from a sequence of sentences (can be a once-only generator stream).
`clear_sims`()
`create_binary_tree`()	Create a binary Huffman tree using stored vocabulary word counts.
`doesnt_match`(words)	Which word from the given list doesn’t go with the others?
`estimate_memory`([vocab_size, report])	Estimate required memory for a model using current settings.
`finalize_vocab`()	Build tables and model weights based on final vocabulary settings.
`infer_vector`(doc_words[, alpha, min_alpha, ...])	Infer a vector for given post-bulk training document.
`init_sims`([replace])	Precompute L2-normalized vectors.
`intersect_word2vec_format`(fname[, lockf, ...])	Merge the input-hidden weight matrix from the original C word2vec-tool format given, where it intersects with the current vocabulary.
`load`(args, *kwargs)
`load_word2vec_format`(fname[, fvocab, ...])	Load the input-hidden weight matrix from the original C word2vec-tool format.
`log_accuracy`(section)
`make_cum_table`([power, domain])	Create a cumulative-distribution table using stored vocabulary word counts for drawing random words in the negative-sampling training routines.
`most_similar`([positive, negative, topn, ...])	Find the top-N most similar words.
`most_similar_cosmul`([positive, negative, topn])	Find the top-N most similar words, using the multiplicative combination objective proposed by Omer Levy and Yoav Goldberg in [R4].
`n_similarity`(ws1, ws2)	Compute cosine similarity between two sets of words.
`reset_from`(other_model)	Reuse shareable structures from other_model.
`reset_weights`()
`save`(args, *kwargs)	Save the object to file (also see load).
`save_word2vec_format`(fname[, fvocab, binary])	Store the input-hidden weight matrix in the same format used by the original C word2vec-tool, for compatibility.
`scale_vocab`([min_count, sample, dry_run, ...])	Apply vocabulary settings for min_count (discarding less-frequent words) and sample (controlling the downsampling of more-frequent words).
`scan_vocab`(documents[, progress_per, trim_rule])
`score`(sentences[, total_sentences, ...])	Score the log probability for a sequence of sentences (can be a once-only generator stream).
`seeded_vector`(seed_string)	Create one ‘random’ vector (but deterministic by seed_string)
`similar_by_vector`(vector[, topn, restrict_vocab])	Find the top-N most similar words by vector.
`similar_by_word`(word[, topn, restrict_vocab])	Find the top-N most similar words.
`similarity`(w1, w2)	Compute cosine similarity between two words.
`sort_vocab`()	Sort the vocabulary so the most frequent words have the lowest indexes.
`train`(sentences[, total_words, word_count, ...])	Update the model’s neural weights from a sequence of sentences (can be a once-only generator stream).
`wmdistance`(document1, document2)	Compute the Word Mover’s Distance between two documents.

Attributes¶

`dbow`
`dm`