License¶
BSD 3-Clause License
Copyright (c) 2020, the wikirec developers. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Change log¶
Changelog¶
wikirec tries to follow semantic versioning, a MAJOR.MINOR.PATCH version where increments are made of the:
MAJOR version when we make incompatible API changes
MINOR version when we add functionality in a backwards compatible manner
PATCH version when we make backwards compatible bug fixes
wikirec 1.0.0 (December 28th, 2021)¶
Release switches wikirec over to semantic versioning and indicates that it is stable
wikirec 0.2.2 (May 20th, 2021)¶
Changes include:
The WikilinkNN model has been added allowing users to derive recommendations based which articles are linked to the same other Wikipedia articles
Examples have been updated to reflect this new model
books_embedding_model.h5 is provided for quick experimentation
enwiki_books.ndjson has been updated with a more recent dump
Function docstring grammar fixes
Baseline testing for the new model has been added to the CI
wikirec 0.2.1 (April 29th, 2021)¶
Changes include:
Support has been added for gensim 3.8.x and 4.x
Wikipedia links are now an output of data_utils.parse_to_ndjson
Dependencies in requirement and environment files are now condensed
wikirec 0.2.0 (April 16th, 2021)¶
Changes include:
Users can now input ratings to weigh recommendations
Fixes for how multiple inputs recommendations were being calculated
Switching over to an src structure
Code quality is now checked with Codacy
Extensive code formatting to improve quality and style
Bug fixes and a more explicit use of exceptions
More extensive contributing guidelines
wikirec 0.1.1.7 (March 14th, 2021)¶
Changes include:
Multiple Infobox topics can be subsetted for at the same time
Users have greater control of the cleaning process
The cleaning process is verbose and uses multiprocessing
The workflow for all models has been improved and explained
Methods have been developed to combine modeling techniques for better results
wikirec 0.1.0 (March 8th, 2021)¶
First stable release of wikirec
Functions to subset Wikipedia in any language by infobox topics have been provided
A multilingual cleaning process that can clean texts of any language to varying degrees of efficacy is included
Similarity matrices can be generated from embeddings using the following models:
BERT
Doc2vec
LDA
TFIDF
Similarity matrices can be created using either cosine or euclidean relations
Usage examples have been provided for multiple input types
Optimal LDA topic numbers can be inferred graphically
The package is fully documented
Virtual environment files are provided
Extensive testing of all modules with GH Actions and Codecov has been performed
A code of conduct and contribution guidelines are included