Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error caused by retest_cnv() function #26

Closed
WalterMuskovic opened this issue Mar 28, 2022 · 6 comments
Closed

Error caused by retest_cnv() function #26

WalterMuskovic opened this issue Mar 28, 2022 · 6 comments

Comments

@WalterMuskovic
Copy link

Hi,

Thank you for developing this great R package. I have been encountering this error message:

ERROR [2022-03-28 09:22:03] Error in mutate(., cnv_state_post = c("loh", "amp", "del", "bamp", "bdel")[which.max(c(p_loh,  : 
  Problem while computing `cnv_state_post = ...[]`.
✖ `cnv_state_post` must be size 1, not 0.

which seems to be caused by some code within the retest_cnv function:

                                p_loh = (G['20'] * L_x_n * L_y_d)/Z,
                                p_amp = ((G['31'] + G['21']) * L_x_a * L_y_a)/Z,
                                p_del = (G['10'] * L_x_d * L_y_d)/Z,
                                p_bamp = (G['22'] * L_x_a * L_y_n)/Z,
                                p_bdel = (G['00'] * L_x_d * L_y_n)/Z

when Z happens to be zero, it causes p_loh, p_amp, p_del, p_bamp and p_del to be NaN, which subsequently causes an error in this line:

cnv_state_post = c('loh', 'amp', 'del', 'bamp', 'bdel')[which.max(c(p_loh, p_amp, p_del, p_bamp, p_bdel))]

when which.max recieves just NaNs like:

c("a","b","c")[which.max(c(NaN,NaN,NaN))]
character(0)

Thank you for your help!

@teng-gao
Copy link
Collaborator

Thanks for trying out Numbat! I think this is likely something we fixed in the devel branch - could you try it out and let us know if the same error appears?

https://github.com/kharchenkolab/numbat/tree/devel

@WalterMuskovic
Copy link
Author

Hi @teng-gao,

Thank you for the speedy reply. Yep, I was running the latest version of the development branch when I got the error. I put together the example below which reproduces the error:

# Load a subset of the data. I've removed the "snp_id","POS", "REF", "ALT", "GT"
# & "snp_index" columns and restricted to just chr7. This is sufficient to
# reproduce the original error.
bulk_subset <- read.csv("~/Downloads/bulk_subset.csv")

# Load required functions
devtools::load_all("~/Documents/numbat/")

# Define some required params
G = c('20' = 1/5, '10' = 1/5, '21' = 1/10, '31' = 1/10, '22' = 1/5, '00' = 1/5)
delta_phi_min = 0.15
exp_model = "lnpois"
gamma = 20
theta_min = 0.065

# Produce error
segs_post = bulk_subset %>% 
    filter(cnv_state != 'neu') %>%
    group_by(CHROM, seg, seg_start, seg_end, cnv_state) %>%
    summarise(
        n_genes = length(na.omit(unique(gene))),
        n_snps = sum(!is.na(pAD)),
        theta_hat = theta_hat_seg(major_count[!is.na(major_count)], minor_count[!is.na(minor_count)]),
        approx_theta_post(pAD[!is.na(pAD)], DP[!is.na(pAD)], p_s[!is.na(pAD)], gamma = unique(gamma), start = theta_hat),
        L_y_n = pnorm.range(0, theta_min, theta_mle, theta_sigma),
        L_y_d = pnorm.range(theta_min, 0.499, theta_mle, theta_sigma),
        L_y_a = pnorm.range(theta_min, 0.375, theta_mle, theta_sigma),
        approx_phi_post(
            Y_obs[!is.na(Y_obs)], lambda_ref[!is.na(Y_obs)], unique(na.omit(d_obs)),
            alpha = alpha[!is.na(Y_obs)],
            beta = beta[!is.na(Y_obs)],
            mu = mu[!is.na(Y_obs)],
            sig = sig[!is.na(Y_obs)],
            model = exp_model
        ),
        L_x_n = pnorm.range(1 - delta_phi_min, 1 + delta_phi_min, phi_mle, phi_sigma),
        L_x_d = pnorm.range(0.1, 1 - delta_phi_min, phi_mle, phi_sigma),
        L_x_a = pnorm.range(1 + delta_phi_min, 3, phi_mle, phi_sigma),
        Z = sum(G['20'] * L_x_n * L_y_d,
                G['10'] * L_x_d * L_y_d,
                G['21'] * L_x_a * L_y_a,
                G['31'] * L_x_a * L_y_a,
                G['22'] * L_x_a * L_y_n, 
                G['00'] * L_x_d * L_y_n),
        p_loh = (G['20'] * L_x_n * L_y_d)/Z,
        p_amp = ((G['31'] + G['21']) * L_x_a * L_y_a)/Z,
        p_del = (G['10'] * L_x_d * L_y_d)/Z,
        p_bamp = (G['22'] * L_x_a * L_y_n)/Z,
        p_bdel = (G['00'] * L_x_d * L_y_n)/Z,
        LLR_x = calc_exp_LLR(
            Y_obs[!is.na(Y_obs)],
            lambda_ref[!is.na(Y_obs)], 
            unique(na.omit(d_obs)),
            phi_mle,
            alpha = alpha[!is.na(Y_obs)],
            beta = beta[!is.na(Y_obs)],
            mu = mu[!is.na(Y_obs)],
            sig = sig[!is.na(Y_obs)],
            model = exp_model
        ),
        LLR_y = calc_allele_LLR(pAD[!is.na(pAD)], DP[!is.na(pAD)], p_s[!is.na(pAD)], theta_mle, gamma = unique(gamma)),
        LLR = LLR_x + LLR_y,
        .groups = 'drop'
    ) %>%
    rowwise() %>%
    mutate(cnv_state_post = c('loh', 'amp', 'del', 'bamp', 'bdel')[
        which.max(c(p_loh, p_amp, p_del, p_bamp, p_bdel))
    ]) %>%
    ungroup()

bulk_subset.csv

@teng-gao
Copy link
Collaborator

teng-gao commented Mar 28, 2022

Hi Walter,

Thanks for the detailed bug report. You're exactly right about what the issue was. It turns out that the aberrant probabilities are so small that the total probability of different CNV states is 0. This shouldn't happen normally .. I noticed that the segment is called as CNLOH (cnv_state column), but the alleles are pretty balanced, which means that there's something weird upstream. Would you mind sharing your input parameters to analyze_bulk and the whole pseudobulk file? You can email me if you'd like tgaoteng@gmail.com
image

@WalterMuskovic
Copy link
Author

Hi @teng-gao just sent you an email now with the requested data and the input params to analyze_bulk. Thanks again for your help!

teng-gao pushed a commit that referenced this issue Mar 31, 2022
@teng-gao
Copy link
Collaborator

Hi @WalterMuskovic I just pushed a commit to devel that should fix this. Please let me know if this works!

@WalterMuskovic
Copy link
Author

Great thanks @teng-gao , the latest changes in the devel branch have fixed the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants