Skip to content

gatk-workflows/seq-format-conversion

Repository files navigation

seq-format-conversion

Workflows for converting between sequence data formats

cram-to-bam :

This script should convert a CRAM to SAM to BAM and output a BAM, BAM Index, and validation report to a Google bucket. If you'd like to do this on multiple CRAMS, create a sample set in the Data tab.
The reason this approach was chosen instead of converting CRAM to BAM directly using Samtools is because Samtools 1.3 produces incorrect bins due to an old version of htslib included in the package. Samtools versions 1.4 & 1.5 have an NM issue that causes them to not validate with Picard.

Requirements/expectations

  • Cram file

Outputs

  • Bam file and index
  • Validation report

paired-fastq-to-unmapped-bam :

This WDL converts paired FASTQ to uBAM and adds read group information

NOTE: paired-fastq-to-unmapped-bam-fc.wdl is a slightly modified version of the original to support users interested running on FireCloud. As input this wdl takes a TSV with each row being a different readgroup and each column in the row being descriptors

Requirements/expectations

  • Pair-end sequencing data in FASTQ format (one file per orientation)
  • The following metada descriptors per sample:
readgroup   fastq_pair1_file_path   fastq_pair2_file_path   sample_name   library_name   platform_unit   run_date   platform_name   sequecing_center

Outputs

  • Set of unmapped BAMs, one per read group
  • File containing a list of the generated unmapped BAMs

bam-to-unmapped-bams :

This WDL converts BAM to unmapped BAMs

Requirements/expectations

  • BAM file

Outputs

  • Sorted Unmapped BAMs

Software version requirements :

  • GATK4 or later
  • Samtools 1.3.1
  • Picard 2.8.3
  • Cromwell version support
    • Successfully tested on v32
    • Does not work on versions < v23 due to output syntax

Important Note :

  • The provided JSON is meant to be a ready to use example JSON template of the workflow. It is the user’s responsibility to correctly set the reference and resource input variables using the GATK Tool and Tutorial Documentations.
  • Relevant reference and resources bundles can be accessed in Resource Bundle.
  • Runtime parameters are optimized for Broad's Google Cloud Platform implementation.
  • For help running workflows on the Google Cloud Platform or locally please view the following tutorial (How to) Execute Workflows from the gatk-workflows Git Organization.
  • The following material is provided by the GATK Team. Please post any questions or concerns to one of our forum sites : GATK , FireCloud or Terra , WDL/Cromwell.
  • Please visit the User Guide site for further documentation on our workflows and tools.

LICENSING :

Copyright Broad Institute, 2019 | BSD-3 This script is released under the WDL open source code license (BSD-3) (full license text at https://github.com/openwdl/wdl/blob/master/LICENSE). Note however that the programs it calls may be subject to different licenses. Users are responsible for checking that they are authorized to run all programs before running this script.