gensim.models.Word2Vec.scale_vocab

Word2Vec.scale_vocab(min_count=None, sample=None, dry_run=False, keep_raw_vocab=False, trim_rule=None)[source]

Apply vocabulary settings for min_count (discarding less-frequent words) and sample (controlling the downsampling of more-frequent words).

Calling with dry_run=True will only simulate the provided settings and report the size of the retained vocabulary, effective corpus length, and estimated memory requirements. Results are both printed via logging and returned as a dict.

Delete the raw vocabulary after the scaling is done to free up RAM, unless keep_raw_vocab is set.