This directory contains the 'raw' files from various sources necessary for g2p harvesting.
-
- manual download
- raw interpretations from cornell linda at standardmolecular dot com
-
- pre-harmonization download of molecularmatch trials
- recreate via
python harvester.py --silos file --harvesters molecularmatch_trials --phases harvest
-
cgi
- cgi_biomarkers_per_variant.tsv
- catalog_of_validated_oncogenic_mutations.tsv
- manual download
- raw evidence
-
allAnnotatedVariants.txt, allActionableVariants.txt
- download by clicking link, or going to oncokb.org/#/dataAccess
- used to run harvester on oncokb
- 'oncokb_' prefix added to files for clarity
-
data_mutations_extended_1.0.1.txt,data_clinical_1.0.1.txt
- manual registration and download
- cohort for GENIE analysis notebook
-
harvester/cosmic_lookup_table.tsv
- manual download
- pre processing done by harvester/Makefile including harvester/oncokb_all_actionable_variants.tsv
-
- manual download from 'Download Data' button at top of page
- cohort for GENIE analysis notebook
-
non_alt_loci_set.json
- wget ftp://ftp.ebi.ac.uk/pub/databases/genenames/new/json/non_alt_loci_set.json
- gene symbol validation