How to visualize pathways enriched in different cell subpopulations? #9

xing9133 · 2023-09-09T14:56:50Z

I can run all the processes smoothly now, but there are still some questions.
Firstly, I could obtained all the results of "Pathway_list" and "scPathways_rankPvalue". However, I could not find a function to visualize these enrichment pathways in the scPagwas package (the output result is similar to Figure6D in the manuscript). So I want to make sure whether there was a built-in function that could visualize these different pathways?
Secondly, the final result contain the gene heritability correlation with gPAS score, But I would also like to look further at the significant SNP sites associated with the trait, which do not seem to be provided in the results. How should I extract results that contain significant SNP sites？
In addition, When visualizing the cell subclasses associated with the trait, should the results of “Merged_celltype_pvalue” or “Random_Correct_BG_adjp” be used? Because I found that the p values of some cell subsets were inconsistent between the two methods。

dengchunyu · 2023-09-11T09:24:31Z

The first issue: scPagwas provides a plotting function(./scPagwas/inst/extdata/plot_scpathway_contri_dot.R), but due to its dependencies on too many packages, it can cause the software package to be unstable and difficult to install. Here,you can plot in this manner.

source(system.file("extdata", "plot_scpathway_contri_dot.R", package = "scPagwas"))
library(scPagwas)
  library(tidyverse)
  library("rhdf5")
 library(ggplot2)
 library(grDevices)
 library(stats)
 library(FactoMineR)
 library(scales)
 library(reshape2)
 library(ggdendro)
 library(grImport2)
 library(gridExtra)
 library(grid)
 library(sisal)
plot_scpathway_dot (Pagwas= Pagwas_data, #Pagwas_data is the result of scPagwas_main in readme.
                               celltypes = unique(Idents(Pagwas_data))[1:5], #here we select 5 celltypes to plot. you can choose what you want to show.
                               topn_path_celltype = 20, #the number of top specific pathways to each celltypes
                               filter_p = 0.05,
                               max_logp = 10, #threshold for max logp
                               display_max_sizes = F,
                               size_var = "CellqValue", 
                               col_var = "proportion",
                               shape.scale = 8,
                               cols.use = c("lightgrey", "#E45826"),
                               dend_x_var = "CellqValue",
                               dist_method = "euclidean", 
                               hclust_method = "ward.D",
                               do_plot = T, #whether to plot
                               figurenames = NULL, #save plot
                               width = 7,
                               height = 7)

The second question pertains to our approach, which relies on Polygenic SNP results and does not support obtaining individual SNP results. To investigate which SNPs play important role, you might consider two approaches. Firstly, you can identify SNPs associated with high PCC (Pearson Correlation Coefficient) genes. Secondly, you can analyze the specific high-contribution pathways from the provided graph and identify a set of relevant SNPs based on all SNPs involved in genes associated with these pathways. The correspondence between SNPs and genes can be computed using the following code:

gwas_data <- bigreadr::fread2(gwas_data_file)
snp_gene_df <- SnpToGene(
        gwas_data = gwas_data, # a data frame
        block_annotation = block_annotation,
        marg = marg # 10000
      )

The Third question：
"Random_Correct_BG_adjp" represents the p-value for individual cells, while "Merged_celltype_pvalue" represents the p-value for cell types obtained by merging the p-values from "Random_Correct_BG_adjp" across all subpopulations of single cells. In single-cell analysis, these two results are the primary focus.

However, in the case of cell line or tissue data analysis, or when computing "Random_Correct_BG_adjp" is infeasible due to a large number of single cells (especially when the "iters_singlecell" parameter is selected, which can be highly time and memory-consuming), it is recommended to use the cell type p-value results obtained from "bootstrap_results."

Nevertheless, it's important to note that the results from "bootstrap_results" are derived from pseudo-bulk data generated by combining single-cell expression data for cell types, and as a result, they may exhibit some differences compared to the single-cell data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to visualize pathways enriched in different cell subpopulations? #9

How to visualize pathways enriched in different cell subpopulations? #9

xing9133 commented Sep 9, 2023

dengchunyu commented Sep 11, 2023

How to visualize pathways enriched in different cell subpopulations? #9

How to visualize pathways enriched in different cell subpopulations? #9

Comments

xing9133 commented Sep 9, 2023

dengchunyu commented Sep 11, 2023