HANSEL: Graph Counterfactual Explanation under Dynamic Data Changes

Team Information:

Bardh Prenkaj (Sapienza University of Rome) [Principal Investigator]
Mario Villaizán-Vallelado (University of Valladolid) [Investigator]
Tobias Leemann (University of Tuebingen) [Investigator]
Gjergji Kasneci (TU Munich) [Supervisor]

General Information:

HANSEL is an open-source framework for Evaluating Graph Counterfactual Explanation Methods under data changes (i.e., data distribution drifts). Our main goal is to create a generic platform that allows the researchers to speed up the process of developing and testing new Timed-Graph Counterfactual Explanation Methods.

Citation Request:

Please cite our paper if you use HANSEL in your experiments:

Prenkaj, B.; Villaiz ́an-Vallelado, M.; Leemann, T.; and Kasneci, G. 2023. Unifying Evolution, Explanation, and Discernment: A Generative Approach for Dynamic Graph Counterfactuals. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24), August 25–29, 2024, Barcelona, Spain. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3637528.3671831

@inproceedings{prenkaj2024gracie,
  title={Unifying Evolution, Explanation, and Discernment: A Generative Approach for Dynamic Graph Counterfactuals},
  author={Bardh Prenkaj and Mario Villaizán-Vallelado and Tobias Leemann and Gjergji Kasneci},
  booktitle={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24)},
  year={2024}
}

Prenkaj, B.; Villaiz ́an-Vallelado, M.; Leemann, T.; and Kasneci, G. 2023. Adapting to Change: Robust Counterfactual Explanations in Dynamic Data Landscapes. In Proceedings of The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECML PKDD 2023)

@inproceedings{prenkaj2023dygrace,
  title={Adapting to Change: Robust Counterfactual Explanations in Dynamic Data Landscapes},
  author={Bardh Prenkaj and Mario Villaizán-Vallelado and Tobias Leemann and Gjergji Kasneci},
  booktitle={Proceedings of The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECML/PKDD 2023)},
  year={2023}
}

Requirements:

scikit-learn
numpy
scipy
pandas
tensorflow (for GCN)
jsonpickle (for serialization)
torch
joblib
rdkit (Molecules)
exmol (maccs method)
networkx (Graphs)

Installation:

The easiest way to get HANSEL up and running with all the dependencies is to pull the development Docker image available in Docker Hub:

docker pull gretel/gretel:latest

The image is based on tensorflow/tensorflow:latest-gpu and it's GPU ready. To set up the container, we recommend you run:

docker-compose run gretel

For simplicity, we provide several makefile rules for easy interaction with the Docker interface:

make docker - builds the development image from scratch
make pull - pull the development image
make push - push the development image
make demo - run the demo in the development image.

Resources provided with the Framework:

Datasets:

DynTree-Cycles: Synthetic data set where each instance is a graph. The instance can be either a tree or a tree with several cycle patterns connected to the main graph by one edge
DBLP-Coauthors: It contains graphs where the vertices represent authors, and the connections are co-authorship relationships between two authors.
BTC-Alpha and BTC-OTC: It contains who-trust-whom networks of traderson the Bitcoin Alpha and Bitcoin OTC platforms
Bonanza: This is similar to eBay and Amazon Marketplace in that users create an account to buy or sell various goods. After a buyer purchases a product from a seller, both can provide a rating about the other along with a short comment.

Oracles:

KNN
SVM
GCN
CustomOracle

Explainers:

DyGRACE: Dynamic Graph Counterfactual Explainer, a semi-supervised GCE method that uses two representation learners and a logistic regressor to find valid counterfactuals
GRACIE: Graph Recalibration and Adaptive Counterfactual Inspection and Explanation

How to use:

Let's see a small example of how to use the framework.

Config file

First, we must create a config JSON file with the option we want to use in our experiment. In the file config/best_models/tree-cycles/gracie/0.json, it is possible to find all options for each component of the framework.

{
    "store_paths": [
        {"name": "dataset_store_path", "address": "./data/datasets/"},
        {"name": "embedder_store_path", "address": "./data/embedders/"},
        {"name": "oracle_store_path", "address": "./data/oracles/"},
        {"name": "explainer_store_path", "address": "./data/explainers/"},
        {"name": "output_store_path", "address": "./output/tree-cycles-dynamic"}
    ],
    "datasets": [
        {
            "name": "dynamic_tree_cycles", 
            "parameters": {
                "begin_time": 2000,
                "end_time": 2003,
                "num_instances_per_snapshot": 100,
                "n_nodes": 28,
                "nodes_in_cycle": 7
            } 
        }
    ],
    "oracles": [
        {
            "name": "dynamic_oracle",
            "parameters": {
                "base_oracle" : {
                    "name": "tree_cycles_custom_oracle", 
                    "parameters": {}
                },
                "first_train_timestamp": 2000
                
            }
            
        }
    ],
    "explainers": [
        {
            "name": "gracie",
            "parameters": { 
                "epochs_ae": 200,
                "fold_id": 0,
                "lr": 1e-3,
                "top_k_cf": 50,
                "batch_size": 64,
                "kl_weight": 0.5,
                "in_dim": 4,
                "decoder_dims": 2,
                "replace_rate": 0.1,
                "mask_rate": 0.3,
                "lambda": 0.5,
                "encoder": {
                    "name": "var_gcn_encoder",
                    "parameters": {
                        "input_dim": 4,
                        "out_dim": 2
                    }
                },
                "decoder": {
                    "name": "gat_decoder",
                    "parameters": {
                        "input_dim": 2,
                        "hidden_dim": 3,
                        "out_dim": 4,
                        "num_layers": 2,
                        "nhead": 16,
                        "nhead_out": 1,
                        "negative_slope": 0.2                    
                    }
                },
                "weight_schedulers": [
                    {
                        "name": "no_decay",
                        "parameters" : {
                            "init_weight": 1
                        }
                    },
                    {
                        "name": "no_decay",
                        "parameters": {
                            "init_weight": 0.5
                        }
                    }
                ]
            } 
        }
    ],
    "evaluation_metrics": [ 
        {"name": "graph_edit_distance", "parameters": {}},
        {"name": "oracle_calls", "parameters": {}},
        {"name": "correctness", "parameters": {}},
        {"name": "oracle_accuracy", "parameters": {}}
    ]
}

Then to execute the experiment from the main the code would be something like this:

from src.evaluation.dynamic_graphs.dynamic_evaluator_manager import DynamicEvaluatorManager

config_file_path = './config/best_models/tree-cycles/gracie/0.json'

print('Creating the evaluation manager.......................................................')
eval_manager = DynamicEvaluatorManager(config_file_path, K=10, run_number=0)

print('Creating the evaluators...................................................................')
eval_manager.create_evaluators()

print('Evaluating the explainers..................................................................')
eval_manager.evaluate()

Once the result JSON files are generated, it is possible to use the report_dynamic_stats.py module to generate a JSON file with the results of the dynamic experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 289 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.idea		.idea
.vscode		.vscode
config		config
data		data
output		output
src		src
yearly_reports/tree_cycles		yearly_reports/tree_cycles
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset_stats.py		dataset_stats.py
docker-compose.yml		docker-compose.yml
dockerfile		dockerfile
generate_datasets.py		generate_datasets.py
launch.sh		launch.sh
makefile		makefile
multi.sh		multi.sh
optimize.py		optimize.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
report_dynamic_results.py		report_dynamic_results.py
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HANSEL: Graph Counterfactual Explanation under Dynamic Data Changes

Table of Contents

Team Information:

General Information:

Citation Request:

Requirements:

Installation:

Resources provided with the Framework:

Datasets:

Oracles:

Explainers:

How to use:

Config file

About

Releases

Packages

Languages

License

bardhprenkaj/HANSEL

Folders and files

Latest commit

History

Repository files navigation

HANSEL: Graph Counterfactual Explanation under Dynamic Data Changes

Table of Contents

Team Information:

General Information:

Citation Request:

Requirements:

Installation:

Resources provided with the Framework:

Datasets:

Oracles:

Explainers:

How to use:

Config file

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages