Skip to content

Latest commit

 

History

History
 
 

seedDB

cBioPortal Seed Database

These files are MySQL database dumps used to seed a new cBioPortal database instance. They include essential data for a fully operational cBioPortal website, such as cancer types, genes, uniprot-mappings, drug information, and network data.

The instructions for building and updating seedDBs can be found here.

Gene and gene alias tables in seedDB are updated every 6 months.

Release Notes

Latest seed database schema 2.13.1

This schema is required for cBioPortal release versions:

  • 5.3.14 or higher

For release versions > 5.3.14, there might be a need to migrate to a new database schema. The migration process is described here.

Schema 2.13.1: SQL file with create table statements
Seed database: seed-cbioportal_hg19_hg38_v2.13.1.sql.gz
md5sum d8e328d43089c817dc26e144b2524e8a

Updates to seed database:

  • Entrez Gene IDs, gene symbols, and gene aliases have been updated based on the HGNC Oct 1, 2023 release. The detailed changes are listed here.
  • Gene Sets have been updated from MSigDB v2023.2.Hs.

Previous seed databases

Seed database schema 2.13.0

From this release onwards, we offer a combined seed database for both hg19 and hg38. To access seed databases from previous versions, please refer to the respective archive folders.

This schema is required for cBioPortal release versions:

  • 5.3.0 or higher

For release versions > 5.3.0, there might be a need to migrate to a new database schema. The migration process is described here.

Schema 2.13.0: SQL file with create table statements
Seed database: seed-cbioportal_hg19_hg38_v2.13.0.sql.gz
md5sum b9e4035a9cc94dc01bbf6f5595842071

Updates to seed database:

  • Entrez Gene IDs, gene symbols, and gene aliases have been updated based on the HGNC April 1, 2023 release. You can find the detailed changes listed here.
  • Gene Sets have been updated from MSigDB v2023.1.Hs.

Seed database schema 2.12.14

This schema is required for cBioPortal release versions:

  • 5.0.0 or higher

For release versions > 5.0.0, there might be a need to migrate to a new database schema. The migration process is described here.

Schema 2.12.14: SQL file with create table statements
Seed database: seed-cbioportal_hg19_v2.12.14.sql.gz
md5sum 05481d66334b65512aef0364ce282fe6

Updates to seed database:

  • Entrez Gene IDs, gene symbols, and gene aliases have been updated based on the HGNC Oct 1, 2022 release. The detailed changes are listed here.
  • Gene Sets have been updated from MSigDB 7.5.1.

Seed database schema 2.12.12

This schema is required for cBioPortal release versions:

  • 3.6.0 or higher

For release versions > 2.0.0, there might be a need to migrate to a new database schema. The migration process is described here.

Schema 2.12.12: SQL file with create table statements
Seed database: seed-cbioportal_hg19_v2.12.12.sql.gz
md5sum 7d805d56aebcee85e2a8690e040310dd

Contents of seed database:

  • Entrez Gene IDs, gene symbols, and gene aliases have been updated based on the HGNC Jan 1, 2022 release.
  • Modifications (supplemental genes, miRNA, and phosphoprotein genes) are implemented using the script.
  • Gene Sets have been updated from MSigDB 7.5.1.
  • All data files in DATAHUB are updated to reflect the gene entry updates. The script/process is described here.

Seed database schema 2.12.8

This schema is required for cBioPortal release versions:

  • 3.6.0 or higher

For release versions > 2.0.0, there might be a need to migrate to a new database schema. The migration process is described here.

Schema 2.12.8: SQL file with create table statements
Seed database: seed-cbioportal_hg19_v2.12.8.sql.gz
md5sum f8d2c65f8d9db795da47ed5cf6f592a9

Contents of seed database:

  • Entrez Gene IDs, gene symbols, and gene aliases have been updated based on the HGNC Feb 20, 2021 release with small modifications listed below.
    • To minimize data loss, we have preserved certain gene entries that are not available in the current HGNC in a supplemental file - Complete lists here
    • Updated outdated gene entries - Complete list here.
    • Removed duplicate symbol <> entrez_ID mapping - Complete list here
    • 7 genes dropped from gene panels - Complete list here
    • All data files in DATAHUB are updated to reflect the gene entry updates. The script/process is described HERE.
  • Gene Sets have been updated from MSigDB 6.1.

Seed database schema 2.7.3

This schema is required for cBioPortal release versions:

  • 2.0.0

For release versions > 2.0.0, a migration step to a new database schema might be required. The migration process is described here.

Schema 2.7.3: SQL file with create table statements
Seed database: seed-cbioportal_hg19_v2.7.3.sql.gz
md5sum 85444ce645104dbc00610fc1f15e8c7a

Contents of seed database:

Seed database schema 2.7.2

This schema is required for cBioPortal release versions:

  • 1.18.0

For release versions > 1.18.0, a migration step to a new database schema might be required. The migration process is described here.

Schema 2.7.2: SQL file with create table statements
Seed database: seed-cbioportal_hg19_v2.7.2.sql.gz
md5sum b0a4e11b94d00a7291129c30ee4e0f70

Contents of seed database:

Seed database schema 2.6.0

This schema is required for cBioPortal release versions:

  • 1.12.x
  • 1.13.x
  • 1.14.0

For release versions > 1.14.0, a migration step to a new database schema might be required. The migration process is described here.

Schema 2.6.0: SQL file with create table statements
Seed database: seed-cbioportal_hg19_v2.6.0.sql.gz
md5sum aafc9da7b72a29f3978ddca31004b8f5

Contents of seed database:

Seed database schema 2.4.0

This schema is required for cBioPortal release versions:

  • 1.9.0

For release versions > 1.9.0, a migration step to a new database schema might be required. The migration process is described here.

Schema 2.4.0: SQL file with create table statements
Seed database : seed-cbioportal_hg19_v2.4.0.sql.gz
md5sum 1014ed1f9d72103f2b46e5615aacbc2f

cBioPortal 1.9.0 with database schema 2.4.0 removed PDB annotations from the database.

Contents of seed database:

  • Entrez Gene IDs, gene symbols, and aliases updated in August 2017 from NCBI.
  • Gene lengths retrieved from Gencode Release 26 (mapped to GRCh37).
  • Pfam graphics fetched in August 2017.

Seed database schema 2.3.1

This schema is required for cBioPortal release versions:

  • 1.7.1
  • 1.7.2
  • 1.7.3
  • 1.8.0

For release versions > 1.8.0, a migration step to a new database schema might be required. The migration process is described here.

Schema 2.3.1: SQL file with create table statements
Seed database part1 (no PDB tables): seed-cbioportal_hg19_v2.3.1.sql.gz
md5sum 324be3d975d22019ee0c82ce0542bcc3
Seed database part2 (optional, only PDB tables): seed-cbioportal_hg19_v2.3.1_only-pdb.sql.gz
md5sum 5774a7947cdf5ef78fd737f1bea688cc

Contents of seed database:

  • Entrez Gene Ids, Hugo symbols and aliases updated in August 2017 from NCBI.
  • Gene lengths retrieved from Gencode Release 26 (mapped to GRCh37).
  • Pfam graphics fetched in August 2017.

Seed database schema 2.1.0

This schema is required for older cBioPortal release versions:

  • 1.5.0
  • 1.5.1
  • 1.5.2

When using this older seed database with a release version > 1.5.2, a migration step to a new database schema is required. The migration process is described here.

Schema 2.1.0: SQL file with create table statements
Seed database part1 (no PDB tables): seed-cbioportal_hg19_v2.1.0.sql.gz
md5sum fe4e8502034f72f182733a72b50dbbc8
Seed database part2 (optional, only PDB tables): seed-cbioportal_hg19_v2.1.0_only-pdb.sql.gz
md5sum 5774a7947cdf5ef78fd737f1bea688cc

Contents of seed database:

  • Entrez Gene Ids, Hugo symbols and aliases updated in September 2016 from NCBI.
  • Gene lengths retrieved from Gencode Release 25 (mapped to GRCh37).
  • Pfam graphics fetched in September 2016.

For Developers

  • The process of updating the seed database for Datahub is described here.
  • The process of updating the gene tables in the seed database is described here.
  • Local data files needs to be updated as well, information can be found here.