languages¶
Module for organizing language dependencies for text cleaning.
The following languages have been selected because their stopwords can be removed via https://github.com/stopwords-iso/stopwords-iso/tree/master/python.
- Contents:
lem_abbr_dict, stem_abbr_dict, sw_abbr_dict
- wikirec.languages.lem_abbr_dict()[source]¶
Calls a dictionary of languages and their abbreviations for lemmatization.
- Returns:
- lem_abbr_dictdict
A dictionary with languages as keys and their abbreviations as items.
Notes
These languages can be lemmatized via https://spacy.io/usage/models.
They are also those that can have their words ordered by parts of speech.
- wikirec.languages.stem_abbr_dict()[source]¶
Calls a dictionary of languages and their abbreviations for stemming.
- Returns:
- stem_abbr_dictdict
A dictionary with languages as keys and their abbreviations as items.
Notes
These languages don’t have good lemmatizers, and will thus be stemmed via https://www.nltk.org/api/nltk.stem.html.
- wikirec.languages.sw_abbr_dict()[source]¶
Calls a dictionary of languages and their abbreviations for stop word removal.
- Returns:
- sw_abbr_dictdict
A dictionary with languages as keys and their abbreviations as items.
Notes
These languages can only have their stopwords removed via https://github.com/stopwords-iso/stopwords-iso).