gensim.corpora.Dictionary.filter_extremes¶

Dictionary.filter_extremes(no_below=5, no_above=0.5, keep_n=100000)[source]¶

Filter out tokens that appear in

less than no_below documents (absolute number) or
more than no_above documents (fraction of total corpus size, not absolute number).
after (1) and (2), keep only the first keep_n most frequent tokens (or keep all if None).

After the pruning, shrink resulting gaps in word ids.

Note: Due to the gap shrinking, the same word may have a different word id before and after the call to this function!