Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync from bench #29

Merged
merged 30 commits into from
Apr 7, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
81683be
baraaorabi Feb 12, 2021
a8f9898
minor changes
baraaorabi Feb 12, 2021
1f3efa1
including LTR-sim as submod
baraaorabi Feb 17, 2021
667abed
upto mapping is automated
baraaorabi Feb 17, 2021
10de8a8
Fixed issue with where there are too many contigs
baraaorabi Feb 17, 2021
43cffc6
Snakefile is good; need to fix freddie_isoforms.py
baraaorabi Feb 18, 2021
854bfbe
isoform collapsing is done; reduced cluster noise
baraaorabi Feb 18, 2021
3661688
Generating transcripts and beds
baraaorabi Feb 26, 2021
04d0132
Generating all baseline fastas and beds
baraaorabi Mar 3, 2021
fa9fe18
bugfix
baraaorabi Mar 3, 2021
14d58f7
We have overlaps too
baraaorabi Mar 5, 2021
1b3cdd1
Done with running tools
baraaorabi Mar 5, 2021
a50549a
rm files; edges are ready
baraaorabi Mar 8, 2021
b70ee93
Bugfix
baraaorabi Mar 10, 2021
772ed12
Accuracy ploting
baraaorabi Mar 10, 2021
d8b5907
accuracy integrated into snakemake
baraaorabi Mar 22, 2021
8ff243e
fully specified env add
baraaorabi Mar 22, 2021
6239af6
Merge branch 'benchmarking' of https://github.com/vpc-ccg/freddie int…
baraaorabi Mar 22, 2021
6203ab0
removed prefix from env
baraaorabi Mar 22, 2021
cc46660
Split conda envs, all working till gtime now
baraaorabi Mar 24, 2021
0a4a820
Made a switch to parallel for mini tasks
baraaorabi Mar 26, 2021
2b7b922
fix in fa gen; acc py gens dirs; acc sm in gpara
baraaorabi Mar 30, 2021
d00ca5e
bugfix for baseline acc
baraaorabi Apr 7, 2021
c95d3ba
Delete .gitmodules
baraaorabi Apr 7, 2021
36abb00
Delete Snakefile-accuracy
baraaorabi Apr 7, 2021
021a0aa
Delete accuracy.yml
baraaorabi Apr 7, 2021
d1f6449
Delete flair.yml
baraaorabi Apr 7, 2021
cacafa0
Delete run_tools.yml
baraaorabi Apr 7, 2021
e057b87
restore Snakefile
baraaorabi Apr 7, 2021
353cf9a
Merge branch 'master' into sync_from_bench
baraaorabi Apr 7, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,4 @@ freddie_dbg
*.mat
*.data
.vscode/
gurobi.lic
Empty file removed .gitmodules
Empty file.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2018 Hach Lab for Computational Cancer Genomics
Copyright (c) 2021 Hach Lab for Computational Cancer Genomics

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
39 changes: 0 additions & 39 deletions cluster.json

This file was deleted.

23 changes: 13 additions & 10 deletions config.yaml
Original file line number Diff line number Diff line change
@@ -1,24 +1,27 @@
outpath:
test/dev-out/
test/benchmark/

gurobi:
license: gurobi.lic
timeout: 15

samples:
N_sim:
reads:
- /groups/hachgrp/projects/dev-ltr-simulator/analysis/ltr-sim-dev/reads/P000S000.L-non.fastq
seq_type: ont1d
data_type: sim
gtf: whole_genome

exec:
align : py/freddie_align.py
split : py/freddie_split.py
segment : py/freddie_segment.py
cluster : py/freddie_cluster.py
isoforms : py/freddie_isoforms.py

samples:
seq_type : ont1d
ref : homo_sapiens
reads :
- extern/LTR-sim/output/reads/22Rv1.L-non.fastq
reads_info :
- extern/LTR-sim/output/reads/22Rv1.L-non.tsv
references:
dna_desalt : /groups/hachgrp/annotations/DNA/97/deSALT.index
homo_sapiens:
annot : extern/LTR-sim/refs/homo_sapiens/homo_sapiens.annot.gtf
genome : extern/LTR-sim/refs/homo_sapiens/homo_sapiens.dna.fa
genome_fai : extern/LTR-sim/refs/homo_sapiens/homo_sapiens.dna.fa.fai
desalt_idx : test/mapping/homo_sapiens.dna.desalt_idx
18 changes: 8 additions & 10 deletions environment.yml → envs/freddie.yml
Original file line number Diff line number Diff line change
@@ -1,19 +1,17 @@
name: freddie
name: freddie_bench_freddie
channels:
- conda-forge
- bioconda
- defaults
- anaconda
- gurobi
dependencies:
- python>=3.6
- snakemake>=5
- desalt=1.5.4
- pysam>=0.15
- desalt==1.5.4
- gurobi>=9.0
- minimap2>=2.16
- networkx>=2
- numpy>=1.16
- pysam>=0.15
- python>=3.6
- scikit-learn>=0.20
- scipy>=1.2.1
- networkx>=2
- gurobi>=9.0
- matplotlib>=3
- pypdf2>=1.26
- scipy>=1.2.1
32 changes: 16 additions & 16 deletions py/freddie_cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -244,8 +244,8 @@ def partition_reads(tint):


def preprocess_ilp(tint, ilp_settings):
print('Preproessing ILP with {} read reps and the following settings:\n{}'.format(
len(tint['read_reps']), ilp_settings))
# print('Preproessing ILP with {} read reps and the following settings:\n{}'.format(
# len(tint['read_reps']), ilp_settings))
read_reps = tint['read_reps']
N = len(read_reps)
M = len(tint['segs'])
Expand Down Expand Up @@ -588,7 +588,7 @@ def run_ilp(tint, remaining_rids, incomp_rids, ilp_settings, log_prefix):

isoforms = {k: dict()
for k in range(ISOFORM_INDEX_START, ilp_settings['K'])}
print('STATUS: {}'.format(ILP_ISOFORMS_STATUS))
# print('STATUS: {}'.format(ILP_ISOFORMS_STATUS))
# if ILP_ISOFORMS_STATUS == GRB.Status.TIME_LIMIT:
# status = 'TIME_LIMIT'
if ILP_ISOFORMS_STATUS != GRB.Status.OPTIMAL:
Expand Down Expand Up @@ -709,27 +709,27 @@ def cluster_tint(cluster_args):
assert len(tints) == 1
tint = list(tints.values())[0]

print('# Clustering tint {}'.format(tint['id']))
# print('# Clustering tint {}'.format(tint['id']))
if logs_dir != None:
os.makedirs('{}/{}'.format(logs_dir, tint['id']), exist_ok=True)
timeout_log = open(
'{}/{}/timeout.log'.format(logs_dir, tint['id']), 'w+')
preprocess_ilp(tint, ilp_settings)
partition_reads(tint)
print('# Paritions ({}) sizes: {}\n'.format(
len(tint['partitions']), [len(p) for p in tint['partitions']]))
# print('# Paritions ({}) sizes: {}\n'.format(
# len(tint['partitions']), [len(p) for p in tint['partitions']]))
tint['isoforms'] = list()
tint['garbage_rids'] = list()
for partition, (remaining_rids, incomp_rids) in enumerate(tint['partitions']):
for rid in remaining_rids:
for ridx in tint['read_reps'][rid]:
tint['reads'][ridx]['partition'] = partition
print(
'==========\ntint {}: Running {}-th partition...'.format(tint['id'], partition))
# print(
# '==========\ntint {}: Running {}-th partition...'.format(tint['id'], partition))
for round_num in range(ilp_settings['max_rounds']):
actual_remaining_rids_len = sum(len(tint['read_reps'][i]) for i in remaining_rids)
print('==========\ntint {}: Running {}-th round with {} read reps and {} actual reads...'.format(
tint['id'], round_num, len(remaining_rids), actual_remaining_rids_len))
# print('==========\ntint {}: Running {}-th round with {} read reps and {} actual reads...'.format(
# tint['id'], round_num, len(remaining_rids), actual_remaining_rids_len))
if actual_remaining_rids_len < min_isoform_size:
break
ILP_ISOFORMS_STATUS, status, round_isoforms = run_ilp(
Expand All @@ -748,12 +748,12 @@ def cluster_tint(cluster_args):
number_of_clustered_reads = 0
for i in round_isoforms.values():
number_of_clustered_reads += sum([len(tint['read_reps'][rid]) for i in round_isoforms.values() for rid in i['rid_to_corrections'].keys()])
print('Number of clustered reads:', number_of_clustered_reads)
# print('Number of clustered reads:', number_of_clustered_reads)
if number_of_clustered_reads < min_isoform_size:
break
for k, isoform in round_isoforms.items():
print('Isoform {} size: {}'.format(
k, len(isoform['rid_to_corrections'])))
# print('Isoform {} size: {}'.format(
# k, len(isoform['rid_to_corrections'])))
if sum(len(tint['read_reps'][rid]) for rid in isoform['rid_to_corrections'].keys()) < min_isoform_size:
continue
tint['isoforms'].append(isoform)
Expand All @@ -763,9 +763,9 @@ def cluster_tint(cluster_args):
for ridx in tint['read_reps'][rid]:
tint['reads'][ridx]['corrections'] = corrections
tint['reads'][ridx]['isoform'] = len(tint['isoforms'])-1
print('------->')
print('Remaining reads: {}\n'.format(len(remaining_rids)))
print('<-------')
# print('------->')
# print('Remaining reads: {}\n'.format(len(remaining_rids)))
# print('<-------')
tint['garbage_rids'].extend(sorted(remaining_rids))
if logs_dir != None:
timeout_log.close()
Expand Down
Loading