Skip to content

Commit

Permalink
version 2.10.1
Browse files Browse the repository at this point in the history
  • Loading branch information
alexpkeil1gov authored and cran-robot committed Dec 7, 2022
1 parent ec0db10 commit 7f017c8
Show file tree
Hide file tree
Showing 11 changed files with 457 additions and 725 deletions.
10 changes: 5 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: qgcomp
Title: Quantile G-Computation
Version: 2.9.0
Date: 2022-10-12
Version: 2.10.1
Date: 2022-12-02
Authors@R:
person("Alexander", "Keil", , "alex.keil@nih.gov", role = c("aut", "cre"))
Description: G-computation for a set of time-fixed exposures with
Expand All @@ -24,10 +24,10 @@ Suggests: broom, devtools, knitr, markdown, MASS, mice
VignetteBuilder: knitr
Encoding: UTF-8
Language: en-US
RoxygenNote: 7.2.1
RoxygenNote: 7.2.2
NeedsCompilation: no
Packaged: 2022-10-12 18:56:48 UTC; keilap
Packaged: 2022-12-02 17:23:45 UTC; keilap
Author: Alexander Keil [aut, cre]
Maintainer: Alexander Keil <alex.keil@nih.gov>
Repository: CRAN
Date/Publication: 2022-10-13 08:30:07 UTC
Date/Publication: 2022-12-07 21:20:02 UTC
20 changes: 10 additions & 10 deletions MD5
Original file line number Diff line number Diff line change
@@ -1,25 +1,25 @@
68bdcaaa259e181c184389aa0f6a14f2 *DESCRIPTION
e1f87a32949b7c8985b37ecc1344bd00 *DESCRIPTION
46baed60dfe33d9c19f1aae13c12aa0d *NAMESPACE
f722f03ed85df97e2b6900613b85d3bb *NEWS.md
0edc35524c6c817311ba01e2d0f2dcee *NEWS.md
f7fc15ccf78c67c72ed9d2e5cbd46606 *R/base.R
7bb09da9bf7f3d5610f6f4d8854ffc15 *R/base_bounds.R
2caa04f826ee1e2f00b16e5a135b81ac *R/base_experimental.R
0864472a76fb326a0384aebcf5debb6e *R/base_extensions.R
a2ec4b87648a50dbcf983a7418c01d6d *R/base_generics.R
1e7d87f01647c3a449e00a4b49a03d83 *R/base_hurdle.R
48dcec918baba27637817e8e9c157e7e *R/base_plots.R
1556cb06da0a0ccb6c2dfad319b24417 *R/base_samplesplits.R
d7d887473726dd844e6ee9fdffa48df8 *R/base_samplesplits.R
3d883410bf8ee1ce5493c3470a37792a *R/base_simhelpers.R
6a235c5c462a40cfa61ccee151c907b4 *R/base_surv.R
c140b5c60f253c3651016aa3f5af5ca0 *R/base_utility.R
2438f922c307df5aa315664442190186 *R/base_zi.R
daf60f4a2a0ea425235870c7988de990 *R/data.R
78102b3565f2cf7c57d3e4c7858ed03e *README.md
d7a6bb0f16119cad7d9eea5498ef6155 *README.md
672f26d9e07ff88039545a4e1bea2070 *build/vignette.rds
c52baf72a0ded2860f1fc7bcc0a44587 *data/metals.RData
47a3b15af74818db2f9af61f00733505 *inst/doc/qgcomp-vignette.R
82927f9afaab8f18e9f1a07d0bd938a0 *inst/doc/qgcomp-vignette.Rmd
dfde92323deaea07cc333c953749812f *inst/doc/qgcomp-vignette.html
038343a31aa4033e2fa61116485e0f69 *inst/doc/qgcomp-vignette.R
e936bb20d22aa110474e052840da4530 *inst/doc/qgcomp-vignette.Rmd
ea7e452f1ed549f791994f5d20554aae *inst/doc/qgcomp-vignette.html
dcba3c6855f754b7494b841d285e3197 *inst/fig/fighex.png
d4f062a438e514faa03a0ae2c9f38a12 *inst/fig/fighex_social.png
a728173c2b39a9a95c900810feca0c26 *inst/fig/res1.png
Expand Down Expand Up @@ -49,7 +49,7 @@ fc8955636f8d783a2a535db417402d13 *man/qgcomp.Rd
5e68bc7f8be2aa7c643f2ca4f12441d1 *man/qgcomp.hurdle.boot.Rd
1418f7d295a1d046cd8e932b6b82f006 *man/qgcomp.hurdle.noboot.Rd
83d33c2554c16afe6960ed07bc16b255 *man/qgcomp.noboot.Rd
7fef25611696b79fb00a35bffc99a773 *man/qgcomp.partials.Rd
8ff81ff23030d782ae018c3a55ba70b9 *man/qgcomp.partials.Rd
754ef31f92fca5d8ff88e47d29e73cf8 *man/qgcomp.survcurve.boot.Rd
7299dba0c2dac6a56b6faa9d1d08f78d *man/qgcomp.zi.boot.Rd
272ff788437b05b20ee98ba277f0e37f *man/qgcomp.zi.noboot.Rd
Expand All @@ -71,6 +71,6 @@ c7c524d4e7a45d818706709a57a6529d *tests/test_basics.R
5e757e8b3ebaa84f6f0dc3b3f78ebb38 *tests/test_mice.R
20abdeb552c2efed1352526d2baf8c88 *tests/test_numeric.R
1132ffeaf05f6fe7b409e9049f826bf2 *tests/test_poisson.R
34309c92ce57d4598adf623550be8cb5 *tests/test_splits.R
c1660dd8df223d50148c870587d30b93 *tests/test_splits.R
055d4f8d4b83a240f614fb268ffb5dbe *tests/test_weights.R
82927f9afaab8f18e9f1a07d0bd938a0 *vignettes/qgcomp-vignette.Rmd
e936bb20d22aa110474e052840da4530 *vignettes/qgcomp-vignette.Rmd
10 changes: 10 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
# qgcomp v2.10.0
## Major changes
- None

## Minor changes
- qgcomp.partials now allows quantile definitions based on the training and validation data, which treats the quantiles as fixed values across both datasets and leads to more stable results in small samples (set via .globalbreaks = TRUE in qgcomp.partials)

## Bug fixes
- Fixed major bug in qgcomp.partials: https://github.com/alexpkeil1/qgcomp/issues/28

# qgcomp v2.9.0
## Major changes
- None
Expand Down
54 changes: 45 additions & 9 deletions R/base_samplesplits.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ qgcomp.partials <- function(
validdata=NULL,
expnms=NULL,
.fixbreaks=TRUE,
.globalbreaks=FALSE,
...
){
#' @title Partial effect sizes, confidence intervals, hypothesis tests
Expand Down Expand Up @@ -55,7 +56,8 @@ qgcomp.partials <- function(
#' @param traindata Data frame with training data
#' @param validdata Data frame with validation data
#' @param expnms Exposure mixture of interest
#' @param .fixbreaks (logical) Use the same quantile cutpoints in the training and validation data (selected in the training data). As of version 2.8.11, the default is TRUE, whereas it was implicitly FALSE in prior verions. Setting to TRUE increases variance but greatly decreases bias in smaller samples.
#' @param .fixbreaks (logical, overridden by .globalbreaks) Use the same quantile cutpoints in the training and validation data (selected in the training data). As of version 2.8.11, the default is TRUE, whereas it was implicitly FALSE in prior verions. Setting to TRUE increases variance but greatly decreases bias in smaller samples.
#' @param .globalbreaks (logical, if TRUE, overrides .fixbreaks) Use the same quantile cutpoints in the training and validation data (selected in combined training and validation data). As of version 2.8.11, the default is TRUE, whereas it was implicitly FALSE in prior verions. Setting to TRUE increases variance but greatly decreases bias in smaller samples.
#' @param ... Arguments to \code{\link[qgcomp]{qgcomp.noboot}},
#' \code{\link[qgcomp]{qgcomp.cox.noboot}}, or
#' \code{\link[qgcomp]{qgcomp.zi.noboot}}
Expand Down Expand Up @@ -140,34 +142,56 @@ qgcomp.partials <- function(
#' splitres5
#'
#' }
# currently broken
if(is.null(traindata) | is.null(validdata))
stop("traindata and validdata must both be specified")
#
traincall <- validcall <- match.call(expand.dots = TRUE)
droppers <- match(c("traindata", "validdata", ".fixbreaks", "fun"), names(traincall), 0L) #index (will need to add names here if more arguments are added)
droppers <- match(c("traindata", "validdata", ".fixbreaks", ".globalbreaks", "fun"), names(traincall), 0L) #index (will need to add names here if more arguments are added)
traincall[["data"]] <- eval(traincall[["traindata"]], parent.frame())
validcall[["data"]] <- eval(validcall[["validdata"]], parent.frame())
traincall <- traincall[-c(droppers)]
validcall <- validcall[-c(droppers)]
hasbreaks = ifelse("breaks" %in% names(traincall), TRUE, FALSE)
hasq = ifelse("q" %in% names(traincall), TRUE, FALSE)
qnull = ifelse(is.null(traincall$q), TRUE, FALSE)
if(hasbreaks && .fixbreaks)
.fixbreaks=FALSE
#
# if q is set to null, and no breaks are provided ensure that no breaks are created
if(hasq){
if(!hasbreaks && qnull && .fixbreaks)
.fixbreaks=FALSE
if(!hasbreaks && qnull && .globalbreaks)
.globalbreaks=FALSE
}

if(is.function(fun)){
traincall[[1L]] <- validcall[[1L]] <- fun
}else{
traincall[[1L]] <- validcall[[1L]] <- as.name(fun[1])
}
if(.globalbreaks){
#
globalcall <- traincall
globalcall[["data"]] <- eval(rbind(traincall[["data"]], validcall[["data"]]), parent.frame())
global.fit = eval(globalcall, parent.frame())
#print(dim(global.fit$fit$data))
#
if(!is.null(global.fit$breaks)){
validcall[["breaks"]] = global.fit$breaks
validcall[["q"]] = NULL
traincall[["breaks"]] = global.fit$breaks
traincall[["q"]] = NULL
}
}
train.fit = eval(traincall, parent.frame())
#####
if(.fixbreaks){
if(.fixbreaks && !.globalbreaks){
validcall$breaks = train.fit$breaks
validcall$q = NULL
}
######
posnms = names(train.fit$pos.weights)
negnms = names(train.fit$neg.weights)
posnms = expnms[is.element(expnms, names(train.fit$pos.weights))]
negnms = expnms[is.element(expnms, names(train.fit$neg.weights))]
if(length(posnms)==1 && all(posnms==c("count", "zero"))){
posnms = names(train.fit$pos.weights$count)
negnms = names(train.fit$neg.weights$count)
Expand All @@ -176,13 +200,24 @@ qgcomp.partials <- function(
res$negmix <- res$posmix <- "none"
if(length(posnms)>0){
res$posmix = posnms
vc = as.list(validcall)
poscall <- validcall
if(!is.null(poscall$breaks) && (.fixbreaks || .globalbreaks)){
posidx <- which(expnms %in% posnms)
poscall$breaks <- poscall$breaks[posidx]
}
vc = as.list(poscall)
vc$expnms = c(posnms)
res$pos.fit <- eval(as.call(vc), parent.frame())
}
if(length(negnms)>0){
res$negmix = negnms
vc = as.list(validcall)
negcall <- validcall
if(!is.null(negcall$breaks) && (.fixbreaks || .globalbreaks)){
#if(.fixbreaks || .globalbreaks){
negidx <- which(expnms %in% negnms)
negcall$breaks <- negcall$breaks[negidx]
}
vc = as.list(negcall)
vc$expnms = c(negnms)
res$neg.fit <- eval(as.call(vc), parent.frame())

Expand All @@ -194,6 +229,7 @@ qgcomp.partials <- function(




print.qgcompmultifit <- function(x, ...){
#' @export
cat(paste0("\nVariables with positive effect sizes in training data: ", paste(x$posmix, collapse = ", ")))
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
`qgcomp` v2.9.0
`qgcomp` v2.10.1



Expand Down
13 changes: 12 additions & 1 deletion inst/doc/qgcomp-vignette.R
Original file line number Diff line number Diff line change
Expand Up @@ -369,7 +369,7 @@ splitres <- qgcomp.partials(
fun="qgcomp.noboot", f=y~., q=4,
traindata=traindata[,c(Xnm, covars, "y")],validdata=validdata[,c(Xnm, covars, "y")], expnms=Xnm,
bayes=FALSE,
.fixbreaks = TRUE
.fixbreaks = TRUE, .globalbreaks=FALSE
)
splitres

Expand All @@ -378,6 +378,17 @@ splitres
plot(splitres$pos.fit)


## ----pe3c, fig.height=5, fig.width=7.5----------------------------------------


splitres_alt <- qgcomp.partials(
fun="qgcomp.noboot", f=y~., q=4,
traindata=traindata[,c(Xnm, covars, "y")],validdata=validdata[,c(Xnm, covars, "y")], expnms=Xnm,
bayes=FALSE,
.fixbreaks = TRUE, .globalbreaks=TRUE
)
splitres_alt

## ----pe4a---------------------------------------------------------------------


Expand Down
15 changes: 14 additions & 1 deletion inst/doc/qgcomp-vignette.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -694,7 +694,7 @@ splitres <- qgcomp.partials(
fun="qgcomp.noboot", f=y~., q=4,
traindata=traindata[,c(Xnm, covars, "y")],validdata=validdata[,c(Xnm, covars, "y")], expnms=Xnm,
bayes=FALSE,
.fixbreaks = TRUE
.fixbreaks = TRUE, .globalbreaks=FALSE
)
splitres
```
Expand All @@ -707,6 +707,19 @@ plot(splitres$pos.fit)

Consistent with our overall results, the overall effect of metals on the simulated outcome `y` is predominantly positive, which is driven mainly by calcium. The partial positive effect of psi=0.42 is slightly attenuated from the partial positive effect given in the original fit (0.44), but is slightly larger than the overall effect from the original fit (psi1=0.34). We note that the effect direction of cadmium is negative, even though it was selected based on positive associations in the training data. This suggests this variable has effects that are close to the null and their direction will depend on which subset of the data are used. This feature allows valid testing of hypotheses - a `global null` in which no exposures have effects will be characterized by variables that randomly switch effect directions between training and validation datasets, which will yield partial effect estimates close to the null with hypothesis tests that have appropriate type-1 error rates in large datasets. Performance in smaller samples is unexplored at this point (and may make an interesting methodologic research question).

By default (subject to change) quantile cut points ("breaks") are defined within the training data and applied to the validation data. You may also change this behavior to allow the breaks to be defined using quantiles from the entire dataset, which treats the quantiles as fixed. This will be expected to improve stability in small samples and may eventually replace the default behavior as the quantiles themselves are not generally treated as random variables within quantile g-computation. For this particular dataset (and seed value), there is little impact of this setting on the results.
```{r pe3c, fig.height=5, fig.width=7.5}
splitres_alt <- qgcomp.partials(
fun="qgcomp.noboot", f=y~., q=4,
traindata=traindata[,c(Xnm, covars, "y")],validdata=validdata[,c(Xnm, covars, "y")], expnms=Xnm,
bayes=FALSE,
.fixbreaks = TRUE, .globalbreaks=TRUE
)
splitres_alt
```

One careful note: when there are multiple exposures with small positive or negative effects, the partial effects may be biased towards the null in studies with moderate or small sample sizes. This occurs because, in the training set, some exposures with small effects are likely to be mis-classified with regard to their effect direction. This means that small negative effects can be added to large positive effects to yield smaller positive effects, and vice versa. In some instances, both the positive and negative partial effects can be in the same direction. This occurs if individual effects are predominantly in one direction, but some are small and subject to having mis-classified directions. As one example: if there is a null overall effect, but there is a positive partial effect driven strongly by one exposure and a balancing negative partial effect driven by numerous weaker associations, partial effect estimates will not sum to the overall effect because the negative partial effect will experience more downward bias in typical sample sizes. Thus, when the overall effect does not equal the sum of the partial effects, there is likely some bias in at least one of the partial effect estimates. This is not necessarily a unique feature of quantile-based g-computation, but may also be a concern for methods that focus on estimation of partial effects, such as weighted quantile sum regression.

The larger question about interpretation (and its worth) of partial effects is left to the analyst. For large datasets with well characterized exposures that have plausible subsets of exposures that would be positively/negatively linearly associated with the outcome, the variables that partition into negative/positive partial effects may make some substantive sense. In more realistic settings that typify exposure mixtures, the partitioning will result in groups that don't entirely make sense. The "partial effect" yields the effect of increasing all exposures in the subset defined by positive coefficients in the training data, while holding all other exposures and confounders constant. In the setting where this corresponds to real world correlation patterns (e.g. all exposures in the positive partial effect share a source), then this may be interpretable roughly as the effect of an action to intervene on the source of these exposures. In most settings, however, interpretation will not be this clear and should not be expected to map onto potential real-world interventions. We note that this is not a function of the quantile g-computation method, but just part of the general messiness of working with exposures mixture data.
Expand Down
Loading

0 comments on commit 7f017c8

Please sign in to comment.