gensim.parsing

This package contains functions to preprocess raw text

Functions

preprocess_documents(docs)
preprocess_string(s[, filters])
read_file(path)
read_files(pattern)
remove_stopwords(s)
split_alphanum(s)
stem(text) Return lowercase and (porter-)stemmed version of string text.
stem_text(text) Return lowercase and (porter-)stemmed version of string text.
strip_multiple_whitespaces(s)
strip_non_alphanum(s)
strip_numeric(s)
strip_punctuation(s)
strip_punctuation2(s)
strip_short(s[, minsize])
strip_tags(s)

Classes

PorterStemmer()