gensim.parsing¶
This package contains functions to preprocess raw text
Functions¶
preprocess_documents(docs) |
|
preprocess_string(s[, filters]) |
|
read_file(path) |
|
read_files(pattern) |
|
remove_stopwords(s) |
|
split_alphanum(s) |
|
stem(text) |
Return lowercase and (porter-)stemmed version of string text. |
stem_text(text) |
Return lowercase and (porter-)stemmed version of string text. |
strip_multiple_whitespaces(s) |
|
strip_non_alphanum(s) |
|
strip_numeric(s) |
|
strip_punctuation(s) |
|
strip_punctuation2(s) |
|
strip_short(s[, minsize]) |
|
strip_tags(s) |