gensim.models.Doc2Vec
¶
-
class
gensim.models.
Doc2Vec
(documents=None, size=300, alpha=0.025, window=8, min_count=5, max_vocab_size=None, sample=0, seed=1, workers=1, min_alpha=0.0001, dm=1, hs=1, negative=0, dbow_words=0, dm_mean=0, dm_concat=0, dm_tag_count=1, docvecs=None, docvecs_mapfile=None, comment=None, trim_rule=None, **kwargs)[source]¶ Class for training, using and evaluating neural networks described in http://arxiv.org/pdf/1405.4053v2.pdf
Methods¶
__init__ ([documents, size, alpha, window, ...]) |
Initialize the model from an iterable of documents. |
accuracy (questions[, restrict_vocab, ...]) |
Compute accuracy of the model. |
build_vocab (sentences[, keep_raw_vocab, ...]) |
Build vocabulary from a sequence of sentences (can be a once-only generator stream). |
clear_sims () |
|
create_binary_tree () |
Create a binary Huffman tree using stored vocabulary word counts. |
doesnt_match (words) |
Which word from the given list doesn’t go with the others? |
estimate_memory ([vocab_size, report]) |
Estimate required memory for a model using current settings. |
finalize_vocab () |
Build tables and model weights based on final vocabulary settings. |
infer_vector (doc_words[, alpha, min_alpha, ...]) |
Infer a vector for given post-bulk training document. |
init_sims ([replace]) |
Precompute L2-normalized vectors. |
intersect_word2vec_format (fname[, lockf, ...]) |
Merge the input-hidden weight matrix from the original C word2vec-tool format given, where it intersects with the current vocabulary. |
load (*args, **kwargs) |
|
load_word2vec_format (fname[, fvocab, ...]) |
Load the input-hidden weight matrix from the original C word2vec-tool format. |
log_accuracy (section) |
|
make_cum_table ([power, domain]) |
Create a cumulative-distribution table using stored vocabulary word counts for drawing random words in the negative-sampling training routines. |
most_similar ([positive, negative, topn, ...]) |
Find the top-N most similar words. |
most_similar_cosmul ([positive, negative, topn]) |
Find the top-N most similar words, using the multiplicative combination objective proposed by Omer Levy and Yoav Goldberg in [R4]. |
n_similarity (ws1, ws2) |
Compute cosine similarity between two sets of words. |
reset_from (other_model) |
Reuse shareable structures from other_model. |
reset_weights () |
|
save (*args, **kwargs) |
Save the object to file (also see load). |
save_word2vec_format (fname[, fvocab, binary]) |
Store the input-hidden weight matrix in the same format used by the original C word2vec-tool, for compatibility. |
scale_vocab ([min_count, sample, dry_run, ...]) |
Apply vocabulary settings for min_count (discarding less-frequent words) and sample (controlling the downsampling of more-frequent words). |
scan_vocab (documents[, progress_per, trim_rule]) |
|
score (sentences[, total_sentences, ...]) |
Score the log probability for a sequence of sentences (can be a once-only generator stream). |
seeded_vector (seed_string) |
Create one ‘random’ vector (but deterministic by seed_string) |
similar_by_vector (vector[, topn, restrict_vocab]) |
Find the top-N most similar words by vector. |
similar_by_word (word[, topn, restrict_vocab]) |
Find the top-N most similar words. |
similarity (w1, w2) |
Compute cosine similarity between two words. |
sort_vocab () |
Sort the vocabulary so the most frequent words have the lowest indexes. |
train (sentences[, total_words, word_count, ...]) |
Update the model’s neural weights from a sequence of sentences (can be a once-only generator stream). |
wmdistance (document1, document2) |
Compute the Word Mover’s Distance between two documents. |