Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cnv_state Column Description is Vague #152

Closed
DarioS opened this issue Dec 27, 2023 · 4 comments
Closed

cnv_state Column Description is Vague #152

DarioS opened this issue Dec 27, 2023 · 4 comments

Comments

@DarioS
Copy link

DarioS commented Dec 27, 2023

cnv_state definition should be improved. I have whole genome sequencing for each sample and have used Purity Ploidy Estimator for inferring purity-adjusted copy number. If the overall genome ploidy is three and a chromosome arm has copy number of two, what should that be coded as? How about if the chromsome arm has copy number five? In the journal article, I also notice that you use HMFTools for your whole genome sequencing analysis. So, is there a conversion script available for PURPLE output that converts it into a suitable file for Numbat input?

@teng-gao
Copy link
Collaborator

teng-gao commented Jan 21, 2024

Thanks for the issue.

If the overall genome ploidy is three and a chromosome arm has copy number of two, what should that be coded as?

2N regions should be designated neutral (NEU) and used as the baseline, and the 3N/5N regions as heterozygous gain (AMP), even if the average ploidy is close to 3.

In the paper benchmark we only used HMFtools to produce CNV calls from WGS, which we compared with Numbat calls from scRNA. We didn't use it as input to Numbat analysis.

See more at
#144

@DarioS
Copy link
Author

DarioS commented Jan 21, 2024

Ah, I understand. It will be easy for me to code. Finally, for seg, what do a and b in 1a, 1b, 2a, 2b signify? Must I follow it?

@teng-gao
Copy link
Collaborator

seg can have any naming convention you want, as long as they're unique identifiers of genomic segments. 1a, 1b etc are just the chr + alphabetically enumerating postfix convention I use

@DarioS
Copy link
Author

DarioS commented Jan 21, 2024

Ah, great, so the segment ID syntax is not used by the software. For the future benefit of other PURPLE users:

makeCNVinput <- function(directory)
{
  segmentFiles <- list.files(directory, "purple.segment.tsv") # All sampleID.purple.segment.tsv files.
  invisible(lapply(segmentFiles, function(segmentFile)
  {
    segmentTable <- read.delim(segmentFile)
    segmentTable$minorAlleleCopyNumber <- round(segmentTable$minorAlleleCopyNumber)
    segmentTable$majorAlleleCopyNumber <- round(segmentTable$majorAlleleCopyNumber)
    segmentTable$tumorCopyNumber <- round(segmentTable$tumorCopyNumber)
    cnv_state <- "neu"
    isBalancedDel <- segmentTable$tumorCopyNumber == 0
    if(any(isBalancedDel)) cnv_state[isBalancedDel] <- "bdel"
    isDel <- segmentTable$tumorCopyNumber == 1
    if(any(isDel)) cnv_state[isDel] <- "del"
    isLOH <- segmentTable$tumorCopyNumber %in% seq(2, 100, 2) & segmentTable$minorAlleleCopyNumber == 0
    if(any(isLOH)) cnv_state[isLOH] <- "loh"
    isAmp <- segmentTable$tumorCopyNumber %in% seq(3, 99, 2)
    if(any(isAmp)) cnv_state[isAmp] <- "amp"
    isBalancedAmp <- segmentTable$tumorCopyNumber %in% seq(4, 100, 2) & segmentTable$minorAlleleCopyNumber == segmentTable$majorAlleleCopyNumber
    if(any(isBalancedAmp)) cnv_state[isBalancedAmp] <- "bamp"
    
    requiredTable <- data.frame(CHROM = segmentTable[, "chromosome"],
                                segment = paste("seg", 1:nrow(segmentTable), sep = ''),
                                seg_start = segmentTable[, "start"],
                                seg_end = segmentTable[, "end"],
                                cnv_state = cnv_state)
    outFile <- gsub("purple", "forNumbat", segmentFile)
    write.table(requiredTable, file = outFile, sep = '\t', quote = FALSE, row.names = FALSE)
  }))
}

@DarioS DarioS closed this as completed Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants