Skip to content

Commit

Permalink
Merge pull request #8 from svandenhoek/new
Browse files Browse the repository at this point in the history
Removed obsolete code & updated README
  • Loading branch information
joerivandervelde authored Feb 13, 2020
2 parents 5c26b44 + a1ef6c6 commit 70c3311
Show file tree
Hide file tree
Showing 6 changed files with 92 additions and 448 deletions.
170 changes: 92 additions & 78 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,76 +1,24 @@
# vibe-suppl
This repo contains supplemental files regarding the Java application found [here][vibe]. Note that these are in no way
needed to use the vibe tool, but were used to generate additional information (such as benchmarking). They were created
with the assumption that they are used exactly in the way they are meant to be used, so while certain checks/validations
might be present, using these scripts in the wrong way might result in weird behavior.
This repo contains supplemental files regarding the Java application found [here][vibe]. Note that these are in no way needed to use the vibe tool, but were used to generate additional information (such as benchmarking). They were created with the assumption that they are used exactly in the way they are meant to be used, so while certain checks/validations might be present, using these scripts in the wrong way might result in weird behavior.

## Benchmarking
## Paper

Please refer to the `README.md` at https://zenodo.org/record/3662470 for the exact commits used for the benchmarking. There, all required files for [PaperPlots.R](benchmarking_results_processing/PaperPlots.R) can be found as well.

### Scripts

There are several benchmarking scripts available with some generic code used by multiple benchmarks in a separate file.
An explanation on how to run the can be found below. In general, the `Runner` scripts runs the benchmark while the
`FileGenerator` script (if available) formats the `Runner` output to a more usable format. Some exceptions are present,
such as for vibe where there is a `ParallelBashScriptsGenerator` instead. So please refer to to
<a href="#running-the-benchmarks">this section<a/> for more information regarding running the individual benchmarks.

* __`AmelieApiOutputGenerator.py`__
* __Info:__ Connects to `https://amelie.stanford.edu/api/` to retrieve the gene scores for each set of HPO terms
available in the benchmark data. As the genes of interest should be entered manually and there is a limit in the
number of entered genes, the [complete HGNC dataset][hgnc_complete]
is used and divided over multiple separate requests so that all genes get a score. As the scores are only sorted
per request, a sort on all genes is done prior to file writing.
* __`AmelieBenchmarkRunner.py`__
* __Info:__ Converts the output from `AmelieApiOutputGenerator.py` for usage in `BenchmarkResultsProcessor.R`.
* __`BenchmarkFileHpoConverter.py`__
* __Info:__ A script to convert a benchmark file containing HPO names in the fifth column to a benchmark file with
HPO codes in the fifth column. Should not be needed for running existing benchmarks, but is supplied as a
convenience script in case benchmarks are created that cannot use `BenchmarkGenerics.py` but do need HPO codes as
input.
* __`BenchmarkGenerics.py`__
* __Info:__ Contains methods used in multiple scripts.
* __Important:__ This script should not be ran independently. If Python scripts are moved (for example to a server
to run the benchmarks there), be sure to include this file within the same directory.
* __`BenchmarkResultsProcessor.R`__
* __Info:__ Creates plots from the benchmark data.
* __`GeneNetworkBenchmarkFileGenerator.py`__
* __Info:__ Converts the output from `GeneNetworkBenchmarkRunner.py` for usage in `BenchmarkResultsProcessor.R`.
* __`GeneNetworkBenchmarkRunner.py`__
* __Info:__ Connects to the API from `https://www.genenetwork.nl/` to retrieve the prioritized genes based on input
phenotypes.
* __`PhenomizerBenchmarkFileGenerator.py`__
* __Info:__ Converts the output from `PhenomizerBenchmarkRunner.py` for usage in `BenchmarkResultsProcessor.R`.
* __`PhenomizerBenchmarkRunner.py`__
* __Info:__ Uses the [query_phenomizer][query_phenomizer] python tool to process all benchmark data.
* __Important:__ [query_phenomizer][query_phenomizer] needs to be installed on the system. Additionally, an account
is needed for running [query_phenomizer][query_phenomizer].
* __`PhenotipsBenchmarkRunner.py`__
* __Info:__ Uses the API of Phenotips to upload the benchmark dataset and then download the results.
* __Important:__ A phenotips instince to which can be connected is required. Please refer to the
[Phenotips download page][phenotips_download] for more information.
* __`VibeBenchmarkFileGenerator.py`__
* __Info:__ Converts the output from `VibeBenchmarkParallelBashScriptsGenerator.py` for usage in `BenchmarkResultsProcessor.R`.
* __`VibeBenchmarkParallelBashScriptsGenerator.py`__
* __Info:__ Generates bash files used for benchmarking (by using a limit of runs per file). Note that for each
created bash script a separate TDB is needed. Please refer to the documentation in the script itself for more
information.
* __Important:__ As each VIBE instance needs a separate database, please refer to the information in the script
itself for how to prepare for the benchmarking correctly.
* __`VibeSimpleOutputFilesMerger.sh`__
* __Info:__ Merges the output generated by the scripts which were created using
`VibeBenchmarkBashScriptsGenerator.py`.
## Benchmarking

### Data

There are several files used among these scripts. These include:
* benchmark_data.tsv
* [benchmark_data.tsv](https://zenodo.org/record/3662470/files/benchmark_data-hgnc_symbol.tsv)
* A dataset with the first column being an ID and the fourth column 1 or more phenotypes separated
by a comma (the phenotype names should exist within the [Human Phenotype Ontology][hpo_obo]) .
* [hp.obo][hpo_obo]
* The Human Phenotype Ontology used for combining/converting phenotype names with their HPO ID.
* The Human Phenotype Ontology used for combining/converting phenotype names with their HPO ID. Note that the `benchmark_data.tsv` was made compatible for release 2018-03-08 specifically.
* [hgnc_complete_set.txt][hgnc_complete]
* The HUGO Gene Nomenclature Committee file containing information about genes (primarily used to generate a list
containing all genes).
* The HUGO Gene Nomenclature Committee file containing information about genes (primarily used to generate a list containing all genes).
* [benchmark_file_conversion_data.tsv](https://www.genenames.org/cgi-bin/download/custom?col=gd_hgnc_id&col=gd_app_sym&col=gd_prev_sym&col=md_eg_id&col=gd_pub_eg_id&status=Approved&status=Entry%20Withdrawn&hgnc_dbtag=on&order_by=gd_hgnc_id&format=text&submit=submit)
* A file generated through [genenames.org](https://www.genenames.org/) that contains HGNC gene symbols with their previous symbols and their NCBI gene IDs.

### Running the benchmarks

Expand All @@ -86,17 +34,63 @@ There are several files used among these scripts. These include:
python3 AmelieBenchmarkFileGenerator.py amelie_output/ amelie_results.tsv
```

#### Gene Network
3. Convert the HGNC gene symbols to NCBI gene IDS:

1. Run benchmark:
```
python3 GeneNetworkBenchmarkRunner.py hp.obo benchmark_data.tsv genenetwork_output/
python3 BenchmarkFileGeneSymbolToIdConverter.py amelie_results.tsv benchmark_file_conversion_data.tsv 1> amelie.log 2> amelie.err
```

#### Exomiser

**IMPORTANT:** A custom `.jar` file supplied by the Exomiser team was supplied to run this benchmark without requiring a `.vcf` file. Exomiser has not yet made a public release of this yet. This custom `.jar` however is based on the exomiser-rest-prioritiser module of the Exomiser open-source code (release 12.1.0).

##### hiPHIVE

1. Run benchmark:

```
python3 ExomiserBenchmarkRunner.py hp.obo benchmark_data.tsv hiphive hiphive_output/
```

2. Process benchmark output:
```
python3 GeneNetworkBenchmarkFileGenerator.py genenetwork_output/ genenetwork_results.tsv
```

```
python3 ExomiserBenchmarkFileGenerator.py hiphive_output/ hiphive_results.tsv
```

3. Convert the HGNC gene symbols to NCBI gene IDS:

```
python3 BenchmarkFileGeneSymbolToIdConverter.py hiphive_results.tsv benchmark_file_conversion_data.tsv 1> hiphive.log 2> hiphive.err
```

##### PhenIX

1. Run benchmark:

```
python3 ExomiserBenchmarkRunner.py hp.obo benchmark_data.tsv phenix phenix_output/
```

2. Process benchmark output:

```
python3 ExomiserBenchmarkFileGenerator.py phenix_output/ phenix_results.tsv
```

3. Convert the HGNC gene symbols to NCBI gene IDS:

```
python3 BenchmarkFileGeneSymbolToIdConverter.py phenix_results.tsv benchmark_file_conversion_data.tsv 1> phenix.log 2> phenix.err
```

#### GADO

We used the stand-alone commandline version GADO (v 1.0.1), available at: https://github.com/molgenis/systemsgenetics/wiki/GADO-Command-line. We accepted all automatically suggested alternative HPO terms in cases that the supplied HPO term could not be used. We have used the prediction matrix `hpo_predictions_sigOnly_spiked_01_02_2018`. The output was also converted to NCBI gene IDs through the following:

```
python3 BenchmarkFileGeneSymbolToIdConverter.py gado_results.tsv benchmark_file_conversion_data.tsv 1> gado.log 2> gado.err
```

#### Phenomizer

Expand All @@ -116,15 +110,36 @@ There are several files used among these scripts. These include:
```
python3 PhenomizerBenchmarkFileGenerator.py phenomizer_output/ phenomizer_results.tsv
```

4. Convert the HGNC gene symbols to NCBI gene IDS:

```
python3 BenchmarkFileGeneSymbolToIdConverter.py phenomizer_results.tsv benchmark_file_conversion_data.tsv 1> phenomizer.log 2> phenimozer.err
```

#### Phenotips

1. Install [phenotips][phenotips_download].
**IMPORTANT**: As of January 2020, Phenotips does not offer a stand-alone downloadable solution anymore and requires a paid cloud subscription to be used ([source](https://phenotips.com/blog/new-year-new-website.html)). While the [GitHub repo](https://github.com/phenotips/phenotips) is currently still online, it seems uncertain whether it will still be updated and the easy-to-use `.dmg` as offered on the old website is not available anymore. Therefore, this benchmark is deemed obsolete.

2. Run benchmark:
```
python3 PhenotipsBenchmarkRunner.py http://localhost:8080/ username hp.obo benchmark_data.tsv phenotips_results.tsv
```
#### PubCaseFinder

1. Run benchmark:

```
python3 PubCaseFinderBenchmarkRunner.py hp.obo benchmark_data.tsv pubcasefinder_output/
```

2. Process benchmark output:

```
python3 PubCaseFinderBenchmarkFileGenerator.py pubcasefinder_output/ pubcasefinder_results.tsv
```

3. Convert the HGNC gene symbols to NCBI gene IDS:

```
python3 BenchmarkFileGeneSymbolToIdConverter.py amelie_results.tsv benchmark_file_conversion_data.tsv 1> amelie.log 2> amelie.err
```

#### Vibe

Expand Down Expand Up @@ -164,16 +179,15 @@ There are several files used among these scripts. These include:

7. Process benchmark output:
```
python3 VibeBenchmarkFileGenerator.py results/ vibe_results.tsv
python3 VibeBenchmarkFileGenerator.py results/ vibe_results.tsv none
```



[vibe]:https://github.com/molgenis/vibe
[vibe_preperations]:https://github.com/molgenis/vibe/#preparations
[vibe_preperations]:https://github.com/molgenis/vibe/#quickstart
[hgnc_complete]:http://ftp.ebi.ac.uk/pub/databases/genenames/new/tsv/hgnc_complete_set.txt
[query_phenomizer]:https://github.com/svandenhoek/query_phenomizer
[phenotips_download]:https://phenotips.org/Download

[hpo_obo_current]:http://purl.obolibrary.org/obo/hp.obo
[hpo_obo]:https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/2f6309173883d5d342849388c74bd986a2c0092c/hp.obo
[hpo_obo]:https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/2f6309173883d5d342849388c74bd986a2c0092c/hp.obo

1 change: 0 additions & 1 deletion benchmarking/GADOBenchmarkReadme

This file was deleted.

55 changes: 0 additions & 55 deletions benchmarking/GeneNetworkBenchmarkFileGenerator.py

This file was deleted.

Loading

0 comments on commit 70c3311

Please sign in to comment.