Skip to content

A toolbox for sparse contrastive principal component analysis

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

PhilBoileau/scPCA

Repository files navigation

R/scPCA

Travis CI Build Status AppVeyor Build Status Codecov test coverage Project Status: Active – The project has reached a stable, usable state and is being actively developed. BioC status Bioc Time status MIT license

Sparse Contrastive Principal Component Analysis for Computational Biology

Authors: Philippe Boileau, Nima Hejazi, Sandrine Dudoit


What’s scPCA?

The exploration and analysis of modern high-dimensional biological data regularly involves the use of dimension reduction techniques in order to tease out meaningful and interpretable information from complex experimental data, often subject to batch effects and other noise. In tandem with the development of sequencing technology (e.g., RNA-seq, scRNA-seq), many variants of PCA have been developed in attempts to remedy deficiencies in interpretability and stability that plague vanilla PCA.

Such developments have included both various forms of sparse PCA (SPCA) (Zou, Hastie, and Tibshirani 2006; Erichson et al. 2018), which increase the stability and interpretability of principal component loadings in high dimensions, and, more recently, contrastive PCA (cPCA) (Abid et al. 2018), which captures relevant information in the target (experimental) data set by eliminating technical noise through comparison to a so-called background data set. While SPCA and cPCA have both individually proven useful in resolving distinct shortcomings of PCA, neither is capable of simultaneously tackling the issues of interpretability, stability and relevance simultaneously. The scPCA package implements sparse contrastive PCA (Boileau, Hejazi, and Dudoit 2020) to accomplish these tasks in the context of high-dimensional biological data. In addition to implementing this newly developed technique, the scPCA package implements cPCA and generalizations thereof.


Installation

For standard use, install from Bioconductor using BiocManager:

if (!requireNamespace("BiocManager", quietly=TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("scPCA")

To contribute, install the bleeding-edge development version from GitHub via remotes:

remotes::install_github("PhilBoileau/scPCA")

Current and prior Bioconductor releases are available under branches with numbers prefixed by “RELEASE_”. For example, to install the version of this package available via Bioconductor 3.10, use

remotes::install_github("PhilBoileau/scPCA@RELEASE_3_10")

Example

For details on how to best use the scPCA R package, please consult the most recent package vignette available through the Bioconductor project.


Issues

If you encounter any bugs or have any specific feature requests, please file an issue.


Contributions

Contributions are welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.


Citation

Please cite the first paper below after using the scPCA R software package. Please also make sure to cite the article describing the statistical methodology when using scPCA or cross-validated cPCA as part of an analysis.

@article{boileau2020scPCAjoss,
  doi = {10.21105/joss.02079},
  url = {https://doi.org/10.21105/joss.02079},
  year = {2020},
  publisher = {The Open Journal},
  volume = {5},
  number = {46},
  pages = {2079},
  author = {Philippe Boileau and Nima Hejazi and Sandrine Dudoit},
  title = {scPCA: A toolbox for sparse contrastive principal component analysis in R},
  journal = {Journal of Open Source Software}
}

@article{boileau2020scPCA,
    author = {Boileau, Philippe and Hejazi, Nima S and Dudoit, Sandrine},
    title = "{Exploring High-Dimensional Biological Data with Sparse Contrastive Principal Component Analysis}",
    journal = {Bioinformatics},
    year = {2020},
    month = {03},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btaa176},
    url = {https://doi.org/10.1093/bioinformatics/btaa176},
    note = {btaa176},
    eprint = {https://academic.oup.com/bioinformatics/article-pdf/doi/10.1093/bioinformatics/btaa176/32914142/btaa176.pdf},
}

License

© 2019-2023 Philippe Boileau

The contents of this repository are distributed under the MIT license. See file LICENSE for details.


References

Abid, Abubakar, Martin J Zhang, Vivek K Bagaria, and James Zou. 2018. “Exploring Patterns Enriched in a Dataset with Contrastive Principal Component Analysis.” Nature Communications 9 (1): 2134.

Boileau, Philippe, Nima S Hejazi, and Sandrine Dudoit. 2020. “Exploring High-Dimensional Biological Data with Sparse Contrastive Principal Component Analysis.” Bioinformatics, March. https://doi.org/10.1093/bioinformatics/btaa176.

Erichson, N. Benjamin, Peng Zeng, Krithika Manohar, Steven L. Brunton, J. Nathan Kutz, and Aleksandr Y. Aravkin. 2018. “Sparse Principal Component Analysis via Variable Projection.” ArXiv abs/1804.00341.

Zou, Hui, Trevor Hastie, and Robert Tibshirani. 2006. “Sparse Principal Component Analysis.” Journal of Computational and Graphical Statistics 15 (2): 265–86.