Skip to content

Source code & appendices accompanying the AAAI2022 paper "Unifying Knowledge Base Completion with PU Learning to Mitigate the Observation Bias"

License

Notifications You must be signed in to change notification settings

ML-KULeuven/KBC-as-PU-Learning

Repository files navigation

Unifying Knowledge Base Completion with PU Learning to Mitigate the Observation Bias

Source code related to the AAAI22 paper:

Unifying Knowledge Base Completion with PU Learning to Mitigate the Observation Bias. Jonas Schouterden, Jessa Bekker, Jesse Davis, Hendrik Blockeel.

Table of Contents

Abstract

The following is the abstract of our paper:

Methods for Knowledge Base Completion (KBC) reason about a knowledge base (KB) in order to derive new facts that should be included in the KB. This is challenging for two reasons. First, KBs only contain positive examples. This complicates model evaluation which needs both positive and negative examples. Second, those facts that were selected to be included in the knowledge base, are most likely not an i.i.d. sample of the true facts, due to the way knowledge bases are constructed. In this paper, we focus on rule-based approaches, which traditionally address the first challenge by making assumptions that enable identifying negative examples, which in turn makes it possible to compute a rule’s confidence or precision. However, they largely ignore the second challenge, which means that their estimates of a rule’s confidence can be biased. This paper approaches rule-based KBC through the lens of PU learning, which can cope with both challenges. We make three contributions. (1) We provide a unifying view that formalizes the relationship between multiple existing confidences measures based on (i) what assumption they make about and (ii) how their accuracy depends on the selection mechanism. (2) We introduce two new confidence measures that can mitigate known biases by using propensity scores that quantify how likely a fact is to be included the KB. (3) We show through theoretical and empirical analysis that taking the bias into account improves the confidence estimates, even when the propensity scores are not known exactly.

Contents of this repository

  • artificial_bias_experiments: Python source code root module for running the experiments & generating images about those experiments.
  • dask_utils: Python code for using dask when running the experiments.
  • data/yago3_10: The yago3-10 dataset. This data directory is also used as root for everything generated when running the experiments.
  • external/AMIE3: External dependency: the AMIE-jar. See also the AMIE3 repository.
  • images: Root directory for all images.
  • kbc_pul: Python source code root module containing the core of this repository: everything related to rules, knowledge bases, confidence metrics and selection mechanisms.
  • notebooks: Jupyter notebooks as illustration on how to do some things.
  • notes: Markdown files describing this repository.
  • paper: PDF of the AAAI paper and its appendices.
  • paper_latex_tables: Tables used in the paper in LaTex.
  • amie_dir.json: Settings file used by our AMIE Python wrapper pointing to the AMIE jar.
  • LICENSE
  • README

Installation

Requirements

Create a fresh Python3 environment (3. or higher) and install the following packages:

  • jupyter: for the notebooks.
  • pandas: for representing the KB.
  • problog : used for its parsing functionalty, i.e. parsing Prolog clauses from their string representation
  • pylo2: see below
  • matplotlib: plotting
  • seaborn: plotting.
  • tqdm: pretty status bars.
  • unidecode: used when cleaning data.
  • tabulate: for pretty table printouts
  • dask.delayed and dask.distributed: for running experiments using dask

Installing Pylo2:

We use data structures from Pylo2 to represent rules as Prolog clauses. More specifically, Pylo2 data structures from src/pylo/language/lp are often used. To install Pylo2 in your Python environment, first clone it:

 git clone git@github.com:sebdumancic/pylo2.git
 cd pylo2

Note that Pylo has a lot of functionality we don't need. As we don't Pylo´s bindings to Prolog engines, we don't need those bindings. To install Pylo2 without these bindings, modify its setup.py by ading right before the line:

print(f"Building:\n\tGNU:{build_gnu}\n\tXSB:{build_xsb}\n\tSWIPL:{build_swi}")

the following lines:

build_gnu = None
build_xsb = None
build_swi = None

Then, install Pylo in the current environment using

python setup.py install

Notebooks

Different notebooks are provided:

Running the experiments

For a description on how to run the experiments, see here.

Generating the tables in the paper

For instructions on how to generate the tables in the paper from the results, see here.

Generating the images in the paper

Instructions on how to generate the images in the paper can be found here.

Preparation of the "ideal" Yago3_10 KB

In the paper, the experiments are run on a cleaned version of the yago3-10 datasets. The cleaning was done to remove unicode characters that might be incompatible with older prolog engines, using ./notebooks/yago3_10/data_exploration_and_preparation/yago3_10_data_cleaning.ipynb

The original data was obtained using AmpliGraph, but can also be found under ./data/yago3_10/original.

The cleaned version can be found under ./data/yago3_10/cleaned_csv.

About

Source code & appendices accompanying the AAAI2022 paper "Unifying Knowledge Base Completion with PU Learning to Mitigate the Observation Bias"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published