License

BSD 3-Clause License

Copyright (c) 2020, the wikirec developers. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
  contributors may be used to endorse or promote products derived from
  this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Change log

Changelog

wikirec tries to follow semantic versioning, a MAJOR.MINOR.PATCH version where increments are made of the:

  • MAJOR version when we make incompatible API changes

  • MINOR version when we add functionality in a backwards compatible manner

  • PATCH version when we make backwards compatible bug fixes

wikirec 1.0.0 (December 28th, 2021)

wikirec 0.2.2 (May 20th, 2021)

Changes include:

  • The WikilinkNN model has been added allowing users to derive recommendations based which articles are linked to the same other Wikipedia articles

  • Examples have been updated to reflect this new model

  • books_embedding_model.h5 is provided for quick experimentation

  • enwiki_books.ndjson has been updated with a more recent dump

  • Function docstring grammar fixes

  • Baseline testing for the new model has been added to the CI

wikirec 0.2.1 (April 29th, 2021)

Changes include:

  • Support has been added for gensim 3.8.x and 4.x

  • Wikipedia links are now an output of data_utils.parse_to_ndjson

  • Dependencies in requirement and environment files are now condensed

wikirec 0.2.0 (April 16th, 2021)

Changes include:

  • Users can now input ratings to weigh recommendations

  • Fixes for how multiple inputs recommendations were being calculated

  • Switching over to an src structure

  • Code quality is now checked with Codacy

  • Extensive code formatting to improve quality and style

  • Bug fixes and a more explicit use of exceptions

  • More extensive contributing guidelines

wikirec 0.1.1.7 (March 14th, 2021)

Changes include:

  • Multiple Infobox topics can be subsetted for at the same time

  • Users have greater control of the cleaning process

  • The cleaning process is verbose and uses multiprocessing

  • The workflow for all models has been improved and explained

  • Methods have been developed to combine modeling techniques for better results

wikirec 0.1.0 (March 8th, 2021)

First stable release of wikirec

  • Functions to subset Wikipedia in any language by infobox topics have been provided

  • A multilingual cleaning process that can clean texts of any language to varying degrees of efficacy is included

  • Similarity matrices can be generated from embeddings using the following models:

    • BERT

    • Doc2vec

    • LDA

    • TFIDF

  • Similarity matrices can be created using either cosine or euclidean relations

  • Usage examples have been provided for multiple input types

  • Optimal LDA topic numbers can be inferred graphically

  • The package is fully documented

  • Virtual environment files are provided

  • Extensive testing of all modules with GH Actions and Codecov has been performed

  • A code of conduct and contribution guidelines are included