Contributing to wikirec

Thank you for your consideration in contributing to this project!

Please take a moment to review this document in order to make the contribution process easy and effective for everyone involved.

Following these guidelines helps to communicate that you respect the time of the developers managing and developing this open source project. In return, and in accordance with this project’s code of conduct, other contributors will reciprocate that respect in addressing your issue or assessing patches and features.

Using the issue tracker

The issue tracker for wikirec is the preferred channel for bug reports, features requests and submitting pull requests.

Bug reports

A bug is a demonstrable problem that is caused by the code in the repository. Good bug reports are extremely helpful - thank you!

Guidelines for bug reports:

  1. Use the GitHub issue search to check if the issue has already been reported.

  2. Check if the issue has been fixed by trying to reproduce it using the latest main or development branch in the repository.

  3. Isolate the problem to make sure that the code in the repository is definitely responsible for the issue.

Great Bug Reports tend to have:

  • A quick summary

  • Steps to reproduce

  • What you expected would happen

  • What actually happens

  • Notes (why this might be happening, things tried that didn’t work, etc)

Again, thank you for your time in reporting issues!

Feature requests

Feature requests are more than welcome! Please take a moment to find out whether your idea fits with the scope and aims of the project. When making a suggestion, provide as much detail and context as possible, and further make clear the degree to which you would like to contribute in its development.

Pull requests

Good pull requests - patches, improvements and new features - are a fantastic help. They should remain focused in scope and avoid containing unrelated commits. Note that all contributions to this project will be made under the specified license and should follow the coding indentation and style standards (contact us if unsure).

Please ask first before embarking on any significant pull request (implementing features, refactoring code, etc), otherwise you risk spending a lot of time working on something that the developers might not want to merge into the project. With that being said, major additions are very appreciated!

When making a contribution, adhering to the GitHub flow process is the best way to get your work merged:

  1. Fork the repo, clone your fork, and configure the remotes:

    # Clone your fork of the repo into the current directory
    git clone https://github.com/<your-username>/<repo-name>
    # Navigate to the newly cloned directory
    cd <repo-name>
    # Assign the original repo to a remote called "upstream"
    git remote add upstream https://github.com/<upsteam-owner>/<repo-name>
    
  2. If you cloned a while ago, get the latest changes from upstream:

    git checkout <dev-branch>
    git pull upstream <dev-branch>
    
  3. Create a new topic branch (off the main project development branch) to contain your feature, change, or fix:

    git checkout -b <topic-branch-name>
    
  4. Commit your changes in logical chunks, and please try to adhere to Conventional Commits. Use Git’s interactive rebase feature to tidy up your commits before making them public.

  5. Locally merge (or rebase) the upstream development branch into your topic branch:

    git pull --rebase upstream <dev-branch>
    
  6. Push your topic branch up to your fork:

    git push origin <topic-branch-name>
    
  7. Open a Pull Request with a clear title and description.

Thank you in advance for your contributions!

License

BSD 3-Clause License

Copyright (c) 2020-2021, The wikirec developers.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
  contributors may be used to endorse or promote products derived from
  this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Change log

Changelog

wikirec tries to follow semantic versioning, a MAJOR.MINOR.PATCH version where increments are made of the:

  • MAJOR version when we make incompatible API changes

  • MINOR version when we add functionality in a backwards compatible manner

  • PATCH version when we make backwards compatible bug fixes

wikirec 1.0.0 (December 28th, 2021)

wikirec 0.2.2 (May 20th, 2021)

Changes include:

  • The WikilinkNN model has been added allowing users to derive recommendations based which articles are linked to the same other Wikipedia articles

  • Examples have been updated to reflect this new model

  • books_embedding_model.h5 is provided for quick experimentation

  • enwiki_books.ndjson has been updated with a more recent dump

  • Function docstring grammar fixes

  • Baseline testing for the new model has been added to the CI

wikirec 0.2.1 (April 29th, 2021)

Changes include:

  • Support has been added for gensim 3.8.x and 4.x

  • Wikipedia links are now an output of data_utils.parse_to_ndjson

  • Dependencies in requirement and environment files are now condensed

wikirec 0.2.0 (April 16th, 2021)

Changes include:

  • Users can now input ratings to weigh recommendations

  • Fixes for how multiple inputs recommendations were being calculated

  • Switching over to an src structure

  • Code quality is now checked with Codacy

  • Extensive code formatting to improve quality and style

  • Bug fixes and a more explicit use of exceptions

  • More extensive contributing guidelines

wikirec 0.1.1.7 (March 14th, 2021)

Changes include:

  • Multiple Infobox topics can be subsetted for at the same time

  • Users have greater control of the cleaning process

  • The cleaning process is verbose and uses multiprocessing

  • The workflow for all models has been improved and explained

  • Methods have been developed to combine modeling techniques for better results

wikirec 0.1.0 (March 8th, 2021)

First stable release of wikirec

  • Functions to subset Wikipedia in any language by infobox topics have been provided

  • A multilingual cleaning process that can clean texts of any language to varying degrees of efficacy is included

  • Similarity matrices can be generated from embeddings using the following models:

    • BERT

    • Doc2vec

    • LDA

    • TFIDF

  • Similarity matrices can be created using either cosine or euclidean relations

  • Usage examples have been provided for multiple input types

  • Optimal LDA topic numbers can be inferred graphically

  • The package is fully documented

  • Virtual environment files are provided

  • Extensive testing of all modules with GH Actions and Codecov has been performed

  • A code of conduct and contribution guidelines are included