You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm struggling a bit to understand what final output file I should consider for downstream analyses following a numbat run. I ran numbat with three iterations and I now want to extract all the cell barcodes associated with a particular genotype. If I run this:
I get 1266 barcodes as opposed to the 1555 from the clone_post_3.tsv file. Also, the 4c_bamp from clone_post_3.tsv is called 4d_bamp here and there are seven genotypes, not six.
So my question is, which one is correct? Should I be extracting barcodes using the Numbat object in R or should I be extracting them from the clone_post_3.tsv file? Why are there differences between these two?
The text was updated successfully, but these errors were encountered:
I'm struggling a bit to understand what final output file I should consider for downstream analyses following a numbat run. I ran numbat with three iterations and I now want to extract all the cell barcodes associated with a particular genotype. If I run this:
cut -f3 ../clone_post_3.tsv | sort -u
I can see there are six genotypes:
""
13a,4c_bamp,18a
13a,4c_bamp,18a,15b,2a
13a,4c_bamp,18a,15b,2a,10a,4a,5a,3a
13a,4c_bamp,18a,15b,2a,10a,4a,5a,3a,6a,7a
13a,4c_bamp,18a,15b,2a,1c,4c_loh
I'm particularly interested in the 4c_loh (last genotype) so when I extract all the barcodes associated with this:
cut -f1,3 ../clone_post_3.tsv | grep -P '13a,4c_bamp,18a,15b,2a,1c,4c_loh$' | wc -l
I get 1555 barcodes which is consistent with the numbers on the clone_post_final.png figure.
If I try to do the same thing from within R:
nb = Numbat$new(glue('/g/data/pq84/single_cell/10X/2046t2/numbat/trial_1/'))
nb$clone_post %>% distinct(GT_opt)
GT_opt
1:
2: 18a,13a,4d_bamp,2a
3: 18a,13a,4d_bamp
4: 18a,13a,4d_bamp,2a,10a,3a,5a
5: 18a,13a,4d_bamp,1c,4d_loh
6: 18a,13a,4d_bamp,2a,10a,3a,5a,6a
7: 18a
nb$clone_post %>% filter(GT_opt == "18a,13a,4d_bamp,1c,4d_loh") %>% select(cell, GT_opt)
I get 1266 barcodes as opposed to the 1555 from the clone_post_3.tsv file. Also, the 4c_bamp from clone_post_3.tsv is called 4d_bamp here and there are seven genotypes, not six.
So my question is, which one is correct? Should I be extracting barcodes using the Numbat object in R or should I be extracting them from the clone_post_3.tsv file? Why are there differences between these two?
The text was updated successfully, but these errors were encountered: