gensim.parsing
¶
This package contains functions to preprocess raw text
Functions¶
preprocess_documents (docs) |
|
preprocess_string (s[, filters]) |
|
read_file (path) |
|
read_files (pattern) |
|
remove_stopwords (s) |
|
split_alphanum (s) |
|
stem (text) |
Return lowercase and (porter-)stemmed version of string text. |
stem_text (text) |
Return lowercase and (porter-)stemmed version of string text. |
strip_multiple_whitespaces (s) |
|
strip_non_alphanum (s) |
|
strip_numeric (s) |
|
strip_punctuation (s) |
|
strip_punctuation2 (s) |
|
strip_short (s[, minsize]) |
|
strip_tags (s) |