Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to know the variability explained by individual PCA (widely_svd) #36

Open
aalsharef opened this issue Apr 2, 2021 · 1 comment
Open

Comments

@aalsharef
Copy link

Hello,

Thanks for the great package! It is not clear to me how to select the number of PCA inside the function "widely_svd". Can I know the variability explained by individual PCA (i.e., selecting the optimal nv) ? This would justify selecting the number of PCAs. For now, I'm setting it to 100 (nv = 100) following the suggestion in "Supervised Machine Learning for Text Analysis in R" by Emil Hvitfeldt and Julia Silge.

Thank you very much!

@juliasilge
Copy link
Owner

Thanks for your patience with this issue! 🙌

In its current implementation, we only return the u matrix from the SVD:

widyr/R/widely_svd.R

Lines 77 to 89 in a6696d6

perform_svd <- function(m) {
s <- irlba::irlba(m, nv = nv, ...)
if (weight_d) {
ret <- t(s$d * t(s$u))
} else {
ret <- s$u
}
rownames(ret) <- rownames(m)
ret
}
sparse <- TRUE
}

Let's think through how we might return other, more complete info from the SVD in the tidy format we use in this package. In the meantime, I would recommend that you use a lower-level interface to SVD like irlba so you can get out all the information you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants