Skip to content

MongoDB Variant Storage

Jacobo Coll Moragón edited this page Sep 23, 2016 · 1 revision

Overview

Variant Storage

MongoDB Schema
_id key

The _id key must allow sort results with indexes. To do this, the key must be sortable lexicographically. This key is used in both variant and stage collections.

The key is a concatenation of chromosome, position, reference and alternate separated by colon.

CHR:POS:REF:ALT

Where:

  • CHR starts with " " if it's a single number chromosome, to sort 2 digits chromosomes.
  • POS has a left padding of 10 positions
  • REF and ALT are a SHA1 of the original allele if is bigger than Variant#SV_THRESHOLD

Example:

Variant _id
22 156 A T 22:.......156:A:T
3 56789 CACA - .3:.....56789:CACA:
X 68432 - GCC X:.....68432::GCC

* spaces has been replaced with dots

Stage collection

Each document in this collection represents a variant. Depending if the variant have been moved already to the Variants collection, this objects will contain also the compressed variant. The variants information is grouped by studies, where the key is the studyId. Each file will be stored inside with the fileId as key. If may happen that a file, for the same variant, have more than one variant. The duplicated variants are stored as an array of variants.

{
  "_id" : "22:    123456:A:T",
  "end" : 123456,
  "ref" : "A",
  "alt" : "T",
  "3" : {
    "4" : ["BinData"]
  }
}

Once the file is moved to the Variants collection, the content is removed (set to null) and a new flag "new : false" is added.

{
  "_id" : "22:    123456:A:T",
  "end" : 123456,
  "ref" : "A",
  "alt" : "T",
  "3" : {
    "4" : null,
    "new" : false,
  }
}
Variants collection
{
  "_id" : "22:    123456:A:T",
  "chromosome" : "1",
  "start" : 123456,
  "end" : 123456,
  "reference" : "A",
  "alternate" : "T",
  "length" : 1,
  "type" : "SNV",
  "_at" : {
    "chunkIds" : [
       "22_123_1k",
       "22_12_10k"
    ]
  },

  "studies" : [
    {
      "sid" : 3,
      "gt": {
        "0|1" : [54, 78, 254, 623],
        "1|1" : [84, 89, 156],
        "?/?" : [110,111,112,113,114,115,116,117,118,119,120]
      },
      "files" : [ 
        { 
           "fid" : 4,
           "attrs" : {}
        }, {
           "fid" : 5,
           "attrs" : {}
        }
      ]
    } 
  ], 
  "stats" : [ {
      "sid" : 3,
      "cid": 6,
      "maf": 0.00638977624475956,
      "mgf": 0,
      "mafAl": "T",
      "mgfGf": "1|1",
      "missAl": 0,
      "missGt": 0,
      "numGt": {
        "0/0" : 562,
        "1|1" : 3,
        "0|1" : 4,
      }
  } ],

  "annotation" : [ {
     "id" : "?",
     "ct" : [
       {
         "so" : [ 1628 ]
       } , {
         "so" : [ 1566 ]
       }
     ],
     "cr_score" : [
       {
         "sc" : 0.8619999885559082,
         "src" : "gerp"
       } , {
         "sc" : 0.004999999888241291,
         "src" : "phastCons"
       } , {
         "sc" : 0.11299999803304672,
         "src" : "phylop"
       }
     ],
     "popFq" : [
       {
         "study" : "1000GENOMES_phase_3",
         "pop" : "ALL",
         "refFq" : 0.9986000061035156,
         "altFq" : 0.0006000000284984708
       } , {
         "study" : "1000GENOMES_phase_3",
         "pop" : "EAS",
         "refFq" : 0.9970200061798096,
         "altFq" : 0.0029800001066178083
       } , {
         "study" : "1000GENOMES_phase_3",
         "pop" : "EUR",
         "refFq" : 0.998009979724884,
         "altFq" : 0
       }
     ],
     ...
  } ],
  "customAnnotation" : {
  }
}
Studies collection
Files collection

Alignment Storage

MongoDB Schema
Clone this wiki locally