utils

The utils module provides needed functions for data cleaning, argument checking, and model selection

Functions

wikirec.utils._check_str_similarity(str_1, str_2)[source]

Checks the similarity of two strings.

wikirec.utils._check_str_args(arguments, valid_args)[source]

Checks whether a str argument is valid, and makes suggestions if not.

wikirec.utils.graph_lda_topic_evals(corpus=None, num_topic_words=10, topic_nums_to_compare=None, metrics=True, verbose=True, **kwargs)[source]

Graphs metrics for the given models over the given number of topics.

Parameters:
corpuslist of lists (default=None)

The text corpus over which analysis should be done.

num_topic_wordsint (default=10)

The number of keywords that should be extracted.

topic_nums_to_comparelist (default=None)

The number of topics to compare metrics over.

Note: None selects all numbers from 1 to num_topic_words.

metricsstr or bool (default=True: all metrics)

The metrics to include.

Options:

stability: model stability based on Jaccard similarity.

coherence: how much the words associated with model topics co-occur.

verbosebool (default=True)

Whether to show a tqdm progress bar for the query.

**kwargskeyword arguments

Arguments correspoding to gensim.models.ldamulticore.LdaMulticore.

Returns:
axmatplotlib axis

A graph of the given metrics for each of the given models based on each topic number.