gensim.models.Phrases

class gensim.models.Phrases(sentences=None, min_count=5, threshold=10.0, max_vocab_size=40000000, delimiter='_')[source]

Detect phrases, based on collected collocation counts. Adjacent words that appear together more frequently than expected are joined together with the _ character.

It can be used to generate phrases on the fly, using the phrases[sentence] and phrases[corpus] syntax.

Methods

__init__([sentences, min_count, threshold, ...]) Initialize the model from an iterable of sentences.
add_vocab(sentences) Merge the collected counts vocab into this phrase detector.
export_phrases(sentences) Generate an iterator that contains all phrases in given ‘sentences’
learn_vocab(sentences, max_vocab_size[, ...]) Collect unigram/bigram counts from the sentences iterable.
load(fname[, mmap]) Load a previously saved object from file (also see save).
save(fname_or_handle[, separately, ...]) Save the object to file (also see load).