Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wdlupdate #4

Merged
merged 19 commits into from
Jul 13, 2018
Merged

Wdlupdate #4

merged 19 commits into from
Jul 13, 2018

Conversation

bshifaw
Copy link

@bshifaw bshifaw commented Mar 7, 2018

pairedtoubam Wdl now uses a readgroup tsv file as input, also added task to compose a file containing a list of the generated ubams

bshifaw added 8 commits February 23, 2018 18:24
removed gatk software requiremnt because repo will contain more than one wdl which may use different versions.
…ile containing a list of the generated ubams
@bshifaw bshifaw requested a review from vdauwera March 7, 2018 20:56

"##_Comment3": "Disk Space",
"##_ConvertPairedFastQsToUnmappedBamWf.PairedFastQsToUnmappedBAM.disk_space_gb": "(optional) Int?",
"##_ConvertPairedFastQsToUnmappedBamWf.PairedFastQsToUnmappedBAM.machine_mem_gb": "(optional) Int?",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listed under disk space but this is a memory parameter

Copy link

@ruchim ruchim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just some suggestions to simplify the workflow/task definition.


"##_Comment3": "Memory",
"##_ConvertPairedFastQsToUnmappedBamWf.CreateUbamList.disk_space_gb": "(optional) Int?",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listed under memory but this is a disk space parameter


# Convert multiple pairs of input fastqs in parallel
scatter (readgroup in readgroup_list) {
scatter (i in range(length(readgroup_array))) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that the original scatter syntax is less verbose:

scatter (readgroup in readgroup_list) {
 
  call PairedFastQsToUnmappedBAM {
    input:
      fastq_1 = readgroup_array[1],
      fastq_2 = readgroup_array[2],
      ...
    }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We switched to using a readgroup list file instead of an array because there isn't an option for map variables in FC for the other input variables (fastq_pairs and metadata)

input:
unmapped_bams = PairedFastQsToUnmappedBAM.output_bam,
ubam_list_name = ubam_list_name,
docker = docker,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjusting spacing

}
output {
File output_bam = "${readgroup_name}.unmapped.bam"
}
}

task CreateUbamList {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this task be simplified by using a write_lines()

command {
 mv ${write_lines(unmapped_bams)}  ${ubam_list_name}.unmapped_bams.list
}

I also don't think it needs to explicitly specify disk/mem as the default Cromwell values should be sufficient. You could probably get away with just docker & preemptible count, or even hard-code preemptible to 3 as such a short task will likely not need more than one attempt.

runtime {
   docker: docker
   preemptible: select_first([preemptible_attempts, 3])
}

README.md Outdated
@@ -23,9 +23,14 @@ This WDL converts paired FASTQ to uBAM and adds read group information
#### Requirements/expectations
- Pair-end sequencing data in FASTQ format (one file per orientation)
- One or more read groups, one per pair of FASTQ files
- The wdl will accept a file containing a list of the readgroups along with their metadata as the input. The wdl expects a specifi format for the colomuns, please follow the format as provided :
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not important but maybe reword this sentence a bit?

This WDL requires an input TSV file such that each row corresponds to a readgroup and the columns contain metadata for each readgroup. The WDL depends on the input TSV to follow a specific format for the columns. Please use the template provided below as the header for the input TSV file:

@bshifaw bshifaw removed the request for review from vdauwera June 1, 2018 19:01
Copy link

@ruchim ruchim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most suggestions are around syntax. For the FC version of these WDLs, it would be good to have the default docker be from GCR and not DockerHub. Secondarily, I believe there's a place where the extensions need adjustment. Other than that, it looks good to me!

## - Sorted Unmapped BAMs
##
## Cromwell version support
## - Successfully tested on v31
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v32?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

call SortSam {
input:
input_bam = unmapped_bam,
sorted_bam_name = output_basename + ".unmapped.bam",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't need the + ".unmapped.bam" anymore (as output_basename includes that extension)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think basename(unmapped_bam, ".coord.sorted.unmapped.bam") removes the unmapped.bam part so this line adds back without the .coord.sorted

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fudge--YES

# Convert pair of FASTQs to uBAM
call PairedFastQsToUnmappedBAM {
input:
fastq_1 = readgroup_array[i][1],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry this is a small thing, but it would be nice to indent this list of inputs :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

#Create a file with a list of the generated ubams
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust spacing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#Create a file with a list of the generated ubams
call CreateFoFN {
input:
array_of_files = PairedFastQsToUnmappedBAM.output_bam,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"ConvertPairedFastQsToUnmappedBamWf.PairedFastQsToUnmappedBAM.disk_size": 200
}
"##_Comment4": "Docker",
"#ConvertPairedFastQsToUnmappedBamWf.gatk_docker_override": "",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the line below, possibly replace empty "" with (optional) String?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

String ubam_list_name = basename(readgroup_list,".list") + "unmapped.bam.list"

String? gatk_docker_override
String gatk_docker = select_first([gatk_docker_override, "broadinstitute/gatk:latest"])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the default docker be the ones in gcr instead of dockerhub? Especially as this is the FireCloud version?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've modified the ones in FC to use gcr. The idea was to have the ones in git use docker to avoid any authentication issues for users running on different platforms.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops no i didn't , I forgot this was the FC version. thanks for catching this.

}
}

#Create a file with a list of the generated ubams
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#Create a file with a list of the generated ubams
call CreateFoFN {
input:
array_of_files = PairedFastQsToUnmappedBAM.output_bam,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
runtime {
docker: docker
memory: mem_size
memory: select_first([machine_mem_gb,10]) + " GB"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing after comma

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@bshifaw bshifaw merged commit 8f36d69 into master Jul 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants