Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to visualize pathways enriched in different cell subpopulations? #9

Open
xing9133 opened this issue Sep 9, 2023 · 1 comment
Open

Comments

@xing9133
Copy link

xing9133 commented Sep 9, 2023

I can run all the processes smoothly now, but there are still some questions.
Firstly, I could obtained all the results of "Pathway_list" and "scPathways_rankPvalue". However, I could not find a function to visualize these enrichment pathways in the scPagwas package (the output result is similar to Figure6D in the manuscript). So I want to make sure whether there was a built-in function that could visualize these different pathways?
Secondly, the final result contain the gene heritability correlation with gPAS score, But I would also like to look further at the significant SNP sites associated with the trait, which do not seem to be provided in the results. How should I extract results that contain significant SNP sites?
In addition, When visualizing the cell subclasses associated with the trait, should the results of “Merged_celltype_pvalue” or “Random_Correct_BG_adjp” be used? Because I found that the p values of some cell subsets were inconsistent between the two methods。

@dengchunyu
Copy link
Collaborator

The first issue: scPagwas provides a plotting function(./scPagwas/inst/extdata/plot_scpathway_contri_dot.R), but due to its dependencies on too many packages, it can cause the software package to be unstable and difficult to install. Here,you can plot in this manner.

source(system.file("extdata", "plot_scpathway_contri_dot.R", package = "scPagwas"))
library(scPagwas)
  library(tidyverse)
  library("rhdf5")
 library(ggplot2)
 library(grDevices)
 library(stats)
 library(FactoMineR)
 library(scales)
 library(reshape2)
 library(ggdendro)
 library(grImport2)
 library(gridExtra)
 library(grid)
 library(sisal)
plot_scpathway_dot (Pagwas= Pagwas_data, #Pagwas_data is the result of scPagwas_main in readme.
                               celltypes = unique(Idents(Pagwas_data))[1:5], #here we select 5 celltypes to plot. you can choose what you want to show.
                               topn_path_celltype = 20, #the number of top specific pathways to each celltypes
                               filter_p = 0.05,
                               max_logp = 10, #threshold for max logp
                               display_max_sizes = F,
                               size_var = "CellqValue", 
                               col_var = "proportion",
                               shape.scale = 8,
                               cols.use = c("lightgrey", "#E45826"),
                               dend_x_var = "CellqValue",
                               dist_method = "euclidean", 
                               hclust_method = "ward.D",
                               do_plot = T, #whether to plot
                               figurenames = NULL, #save plot
                               width = 7,
                               height = 7) 

image

The second question pertains to our approach, which relies on Polygenic SNP results and does not support obtaining individual SNP results. To investigate which SNPs play important role, you might consider two approaches. Firstly, you can identify SNPs associated with high PCC (Pearson Correlation Coefficient) genes. Secondly, you can analyze the specific high-contribution pathways from the provided graph and identify a set of relevant SNPs based on all SNPs involved in genes associated with these pathways. The correspondence between SNPs and genes can be computed using the following code:

gwas_data <- bigreadr::fread2(gwas_data_file)
snp_gene_df <- SnpToGene(
        gwas_data = gwas_data, # a data frame
        block_annotation = block_annotation,
        marg = marg # 10000
      )

The Third question:
"Random_Correct_BG_adjp" represents the p-value for individual cells, while "Merged_celltype_pvalue" represents the p-value for cell types obtained by merging the p-values from "Random_Correct_BG_adjp" across all subpopulations of single cells. In single-cell analysis, these two results are the primary focus.

However, in the case of cell line or tissue data analysis, or when computing "Random_Correct_BG_adjp" is infeasible due to a large number of single cells (especially when the "iters_singlecell" parameter is selected, which can be highly time and memory-consuming), it is recommended to use the cell type p-value results obtained from "bootstrap_results."

Nevertheless, it's important to note that the results from "bootstrap_results" are derived from pseudo-bulk data generated by combining single-cell expression data for cell types, and as a result, they may exhibit some differences compared to the single-cell data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants