version 2.10.1

cran · Dec 7, 2022 · 7f017c8 · 7f017c8
1 parent ec0db10
commit 7f017c8
Show file tree

Hide file tree

Showing 11 changed files with 457 additions and 725 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: qgcomp
 Title: Quantile G-Computation
-Version: 2.9.0
-Date: 2022-10-12
+Version: 2.10.1
+Date: 2022-12-02
 Authors@R: 
     person("Alexander", "Keil", , "alex.keil@nih.gov", role = c("aut", "cre"))
 Description: G-computation for a set of time-fixed exposures with
@@ -24,10 +24,10 @@ Suggests: broom, devtools, knitr, markdown, MASS, mice
 VignetteBuilder: knitr
 Encoding: UTF-8
 Language: en-US
-RoxygenNote: 7.2.1
+RoxygenNote: 7.2.2
 NeedsCompilation: no
-Packaged: 2022-10-12 18:56:48 UTC; keilap
+Packaged: 2022-12-02 17:23:45 UTC; keilap
 Author: Alexander Keil [aut, cre]
 Maintainer: Alexander Keil <alex.keil@nih.gov>
 Repository: CRAN
-Date/Publication: 2022-10-13 08:30:07 UTC
+Date/Publication: 2022-12-07 21:20:02 UTC
diff --git a/MD5 b/MD5
@@ -1,25 +1,25 @@
-68bdcaaa259e181c184389aa0f6a14f2 *DESCRIPTION
+e1f87a32949b7c8985b37ecc1344bd00 *DESCRIPTION
 46baed60dfe33d9c19f1aae13c12aa0d *NAMESPACE
-f722f03ed85df97e2b6900613b85d3bb *NEWS.md
+0edc35524c6c817311ba01e2d0f2dcee *NEWS.md
 f7fc15ccf78c67c72ed9d2e5cbd46606 *R/base.R
 7bb09da9bf7f3d5610f6f4d8854ffc15 *R/base_bounds.R
 2caa04f826ee1e2f00b16e5a135b81ac *R/base_experimental.R
 0864472a76fb326a0384aebcf5debb6e *R/base_extensions.R
 a2ec4b87648a50dbcf983a7418c01d6d *R/base_generics.R
 1e7d87f01647c3a449e00a4b49a03d83 *R/base_hurdle.R
 48dcec918baba27637817e8e9c157e7e *R/base_plots.R
-1556cb06da0a0ccb6c2dfad319b24417 *R/base_samplesplits.R
+d7d887473726dd844e6ee9fdffa48df8 *R/base_samplesplits.R
 3d883410bf8ee1ce5493c3470a37792a *R/base_simhelpers.R
 6a235c5c462a40cfa61ccee151c907b4 *R/base_surv.R
 c140b5c60f253c3651016aa3f5af5ca0 *R/base_utility.R
 2438f922c307df5aa315664442190186 *R/base_zi.R
 daf60f4a2a0ea425235870c7988de990 *R/data.R
-78102b3565f2cf7c57d3e4c7858ed03e *README.md
+d7a6bb0f16119cad7d9eea5498ef6155 *README.md
 672f26d9e07ff88039545a4e1bea2070 *build/vignette.rds
 c52baf72a0ded2860f1fc7bcc0a44587 *data/metals.RData
-47a3b15af74818db2f9af61f00733505 *inst/doc/qgcomp-vignette.R
-82927f9afaab8f18e9f1a07d0bd938a0 *inst/doc/qgcomp-vignette.Rmd
-dfde92323deaea07cc333c953749812f *inst/doc/qgcomp-vignette.html
+038343a31aa4033e2fa61116485e0f69 *inst/doc/qgcomp-vignette.R
+e936bb20d22aa110474e052840da4530 *inst/doc/qgcomp-vignette.Rmd
+ea7e452f1ed549f791994f5d20554aae *inst/doc/qgcomp-vignette.html
 dcba3c6855f754b7494b841d285e3197 *inst/fig/fighex.png
 d4f062a438e514faa03a0ae2c9f38a12 *inst/fig/fighex_social.png
 a728173c2b39a9a95c900810feca0c26 *inst/fig/res1.png
@@ -49,7 +49,7 @@ fc8955636f8d783a2a535db417402d13 *man/qgcomp.Rd
 5e68bc7f8be2aa7c643f2ca4f12441d1 *man/qgcomp.hurdle.boot.Rd
 1418f7d295a1d046cd8e932b6b82f006 *man/qgcomp.hurdle.noboot.Rd
 83d33c2554c16afe6960ed07bc16b255 *man/qgcomp.noboot.Rd
-7fef25611696b79fb00a35bffc99a773 *man/qgcomp.partials.Rd
+8ff81ff23030d782ae018c3a55ba70b9 *man/qgcomp.partials.Rd
 754ef31f92fca5d8ff88e47d29e73cf8 *man/qgcomp.survcurve.boot.Rd
 7299dba0c2dac6a56b6faa9d1d08f78d *man/qgcomp.zi.boot.Rd
 272ff788437b05b20ee98ba277f0e37f *man/qgcomp.zi.noboot.Rd
@@ -71,6 +71,6 @@ c7c524d4e7a45d818706709a57a6529d *tests/test_basics.R
 5e757e8b3ebaa84f6f0dc3b3f78ebb38 *tests/test_mice.R
 20abdeb552c2efed1352526d2baf8c88 *tests/test_numeric.R
 1132ffeaf05f6fe7b409e9049f826bf2 *tests/test_poisson.R
-34309c92ce57d4598adf623550be8cb5 *tests/test_splits.R
+c1660dd8df223d50148c870587d30b93 *tests/test_splits.R
 055d4f8d4b83a240f614fb268ffb5dbe *tests/test_weights.R
-82927f9afaab8f18e9f1a07d0bd938a0 *vignettes/qgcomp-vignette.Rmd
+e936bb20d22aa110474e052840da4530 *vignettes/qgcomp-vignette.Rmd
diff --git a/NEWS.md b/NEWS.md
@@ -1,3 +1,13 @@
+# qgcomp v2.10.0
+## Major changes
+- None
+
+## Minor changes
+- qgcomp.partials now allows quantile definitions based on the training and validation data, which treats the quantiles as fixed values across both datasets and leads to more stable results in small samples (set via .globalbreaks = TRUE in qgcomp.partials)
+
+## Bug fixes
+- Fixed major bug in qgcomp.partials: https://github.com/alexpkeil1/qgcomp/issues/28
+
 # qgcomp v2.9.0
 ## Major changes
 - None

diff --git a/R/base_samplesplits.R b/R/base_samplesplits.R
@@ -6,6 +6,7 @@ qgcomp.partials <- function(
     validdata=NULL,
     expnms=NULL,
     .fixbreaks=TRUE,
+    .globalbreaks=FALSE,
     ...
 ){
   #' @title Partial effect sizes, confidence intervals, hypothesis tests
@@ -55,7 +56,8 @@ qgcomp.partials <- function(
   #' @param traindata Data frame with training data
   #' @param validdata Data frame with validation data
   #' @param expnms Exposure mixture of interest
-  #' @param .fixbreaks (logical) Use the same quantile cutpoints in the training and validation data (selected in the training data). As of version 2.8.11, the default is TRUE, whereas it was implicitly FALSE in prior verions. Setting to TRUE increases variance but greatly decreases bias in smaller samples.
+  #' @param .fixbreaks (logical, overridden by .globalbreaks) Use the same quantile cutpoints in the training and validation data (selected in the training data). As of version 2.8.11, the default is TRUE, whereas it was implicitly FALSE in prior verions. Setting to TRUE increases variance but greatly decreases bias in smaller samples.
+  #' @param .globalbreaks (logical, if TRUE, overrides .fixbreaks) Use the same quantile cutpoints in the training and validation data (selected in combined training and validation data). As of version 2.8.11, the default is TRUE, whereas it was implicitly FALSE in prior verions. Setting to TRUE increases variance but greatly decreases bias in smaller samples.
   #' @param ... Arguments to \code{\link[qgcomp]{qgcomp.noboot}}, 
   #'    \code{\link[qgcomp]{qgcomp.cox.noboot}}, or 
   #'    \code{\link[qgcomp]{qgcomp.zi.noboot}}
@@ -140,34 +142,56 @@ qgcomp.partials <- function(
   #' splitres5
   #'                 
   #' }
-  # currently broken
   if(is.null(traindata) | is.null(validdata))
     stop("traindata and validdata must both be specified")
   #
   traincall <- validcall <- match.call(expand.dots = TRUE)
-  droppers <- match(c("traindata", "validdata", ".fixbreaks", "fun"), names(traincall), 0L) #index (will need to add names here if more arguments are added)
+  droppers <- match(c("traindata", "validdata", ".fixbreaks", ".globalbreaks", "fun"), names(traincall), 0L) #index (will need to add names here if more arguments are added)
   traincall[["data"]] <- eval(traincall[["traindata"]], parent.frame())
   validcall[["data"]] <- eval(validcall[["validdata"]], parent.frame())
   traincall <- traincall[-c(droppers)]
   validcall <- validcall[-c(droppers)]
   hasbreaks = ifelse("breaks" %in% names(traincall), TRUE, FALSE)
+  hasq = ifelse("q" %in% names(traincall), TRUE, FALSE)
+  qnull = ifelse(is.null(traincall$q), TRUE, FALSE)
   if(hasbreaks && .fixbreaks)
     .fixbreaks=FALSE
-  #
+  # if q is set to null, and no breaks are provided ensure that no breaks are created
+  if(hasq){
+    if(!hasbreaks && qnull && .fixbreaks)
+      .fixbreaks=FALSE
+    if(!hasbreaks && qnull && .globalbreaks)
+      .globalbreaks=FALSE
+  }
+
   if(is.function(fun)){
     traincall[[1L]] <- validcall[[1L]] <- fun
   }else{
     traincall[[1L]] <- validcall[[1L]] <- as.name(fun[1])
   }
+  if(.globalbreaks){
+    #
+    globalcall <- traincall
+    globalcall[["data"]] <- eval(rbind(traincall[["data"]], validcall[["data"]]), parent.frame())
+    global.fit = eval(globalcall, parent.frame())
+    #print(dim(global.fit$fit$data))
+    #
+    if(!is.null(global.fit$breaks)){
+      validcall[["breaks"]] = global.fit$breaks
+      validcall[["q"]] = NULL
+      traincall[["breaks"]] = global.fit$breaks
+      traincall[["q"]] = NULL
+    }
+  }
   train.fit = eval(traincall, parent.frame())
   #####
-  if(.fixbreaks){
+  if(.fixbreaks && !.globalbreaks){
     validcall$breaks = train.fit$breaks
     validcall$q = NULL
   }
   ######
-  posnms = names(train.fit$pos.weights)
-  negnms = names(train.fit$neg.weights)
+  posnms = expnms[is.element(expnms, names(train.fit$pos.weights))]
+  negnms = expnms[is.element(expnms, names(train.fit$neg.weights))]
   if(length(posnms)==1 && all(posnms==c("count", "zero"))){
     posnms = names(train.fit$pos.weights$count)
     negnms = names(train.fit$neg.weights$count)
@@ -176,13 +200,24 @@ qgcomp.partials <- function(
   res$negmix <- res$posmix <- "none"
   if(length(posnms)>0){
     res$posmix = posnms
-    vc = as.list(validcall)
+    poscall <- validcall
+    if(!is.null(poscall$breaks)  && (.fixbreaks || .globalbreaks)){
+      posidx <- which(expnms %in% posnms)
+      poscall$breaks <- poscall$breaks[posidx]
+    }
+    vc = as.list(poscall)
     vc$expnms = c(posnms)
     res$pos.fit <- eval(as.call(vc), parent.frame())
   }
   if(length(negnms)>0){
     res$negmix = negnms
-    vc = as.list(validcall)
+    negcall <- validcall
+    if(!is.null(negcall$breaks)  && (.fixbreaks || .globalbreaks)){
+      #if(.fixbreaks || .globalbreaks){
+      negidx <- which(expnms %in% negnms)
+      negcall$breaks <- negcall$breaks[negidx]
+    }
+    vc = as.list(negcall)
     vc$expnms = c(negnms)
     res$neg.fit <- eval(as.call(vc), parent.frame())
 
@@ -194,6 +229,7 @@ qgcomp.partials <- function(
 
 
 
+
 print.qgcompmultifit <- function(x, ...){
   #' @export
   cat(paste0("\nVariables with positive effect sizes in training data: ", paste(x$posmix, collapse = ", ")))

diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-`qgcomp` v2.9.0
+`qgcomp` v2.10.1
 
 
 

diff --git a/inst/doc/qgcomp-vignette.R b/inst/doc/qgcomp-vignette.R
@@ -369,7 +369,7 @@ splitres <- qgcomp.partials(
   fun="qgcomp.noboot", f=y~., q=4, 
   traindata=traindata[,c(Xnm, covars, "y")],validdata=validdata[,c(Xnm, covars, "y")], expnms=Xnm,
   bayes=FALSE, 
-  .fixbreaks = TRUE
+  .fixbreaks = TRUE, .globalbreaks=FALSE
   )
 splitres
 
@@ -378,6 +378,17 @@ splitres
 plot(splitres$pos.fit)
 
 
+## ----pe3c, fig.height=5, fig.width=7.5----------------------------------------
+
+
+splitres_alt <- qgcomp.partials(
+  fun="qgcomp.noboot", f=y~., q=4, 
+  traindata=traindata[,c(Xnm, covars, "y")],validdata=validdata[,c(Xnm, covars, "y")], expnms=Xnm,
+  bayes=FALSE, 
+  .fixbreaks = TRUE, .globalbreaks=TRUE
+  )
+splitres_alt
+
 ## ----pe4a---------------------------------------------------------------------
 
 

diff --git a/inst/doc/qgcomp-vignette.Rmd b/inst/doc/qgcomp-vignette.Rmd
@@ -694,7 +694,7 @@ splitres <- qgcomp.partials(
   fun="qgcomp.noboot", f=y~., q=4, 
   traindata=traindata[,c(Xnm, covars, "y")],validdata=validdata[,c(Xnm, covars, "y")], expnms=Xnm,
   bayes=FALSE, 
-  .fixbreaks = TRUE
+  .fixbreaks = TRUE, .globalbreaks=FALSE
   )
 splitres
 ```
@@ -707,6 +707,19 @@ plot(splitres$pos.fit)
 
 Consistent with our overall results, the overall effect of metals on the simulated outcome `y` is predominantly positive, which is driven mainly by calcium. The partial positive effect of psi=0.42 is slightly attenuated from the partial positive effect given in the original fit (0.44), but is slightly larger than the overall effect from the original fit (psi1=0.34). We note that the effect direction of cadmium is negative, even though it was selected based on positive associations in the training data. This suggests this variable has effects that are close to the null and their direction will depend on which subset of the data are used. This feature allows valid testing of hypotheses - a `global null` in which no exposures have effects will be characterized by variables that randomly switch effect directions between training and validation datasets, which will yield partial effect estimates close to the null with hypothesis tests that have appropriate type-1 error rates in large datasets. Performance in smaller samples is unexplored at this point (and may make an interesting methodologic research question).
 
+By default (subject to change) quantile cut points ("breaks") are defined within the training data and applied to the validation data. You may also change this behavior to allow the breaks to be defined using quantiles from the entire dataset, which treats the quantiles as fixed. This will be expected to improve stability in small samples and may eventually replace the default behavior as the quantiles themselves are not generally treated as random variables within quantile g-computation. For this particular dataset (and seed value), there is little impact of this setting on the results.
+```{r pe3c, fig.height=5, fig.width=7.5}
+    
+
+splitres_alt <- qgcomp.partials(
+  fun="qgcomp.noboot", f=y~., q=4, 
+  traindata=traindata[,c(Xnm, covars, "y")],validdata=validdata[,c(Xnm, covars, "y")], expnms=Xnm,
+  bayes=FALSE, 
+  .fixbreaks = TRUE, .globalbreaks=TRUE
+  )
+splitres_alt
+```
+
 One careful note: when there are multiple exposures with small positive or negative effects, the partial effects may be biased towards the null in studies with moderate or small sample sizes. This occurs because, in the training set, some exposures with small effects are likely to be mis-classified with regard to their effect direction. This means that small negative effects can be added to large positive effects to yield smaller positive effects, and vice versa. In some instances, both the positive and negative partial effects can be in the same direction. This occurs if individual effects are predominantly in one direction, but some are small and subject to having mis-classified directions. As one example: if there is a null overall effect, but there is a positive partial effect driven strongly by one exposure and a balancing negative partial effect driven by numerous weaker associations, partial effect estimates will not sum to the overall effect because the negative partial effect will experience more downward bias in typical sample sizes. Thus, when the overall effect does not equal the sum of the partial effects, there is likely some bias in at least one of the partial effect estimates. This is not necessarily a unique feature of quantile-based g-computation, but may also be a concern for methods that focus on estimation of partial effects, such as weighted quantile sum regression.
 
 The larger question about interpretation (and its worth) of partial effects is left to the analyst. For large datasets with well characterized exposures that have plausible subsets of exposures that would be positively/negatively linearly associated with the outcome, the variables that partition into negative/positive partial effects may make some substantive sense. In more realistic settings that typify exposure mixtures, the partitioning will result in groups that don't entirely make sense. The "partial effect" yields the effect of increasing all exposures in the subset defined by positive coefficients in the training data, while holding all other exposures and confounders constant. In the setting where this corresponds to real world correlation patterns (e.g. all exposures in the positive partial effect share a source), then this may be interpretable roughly as the effect of an action to intervene on the source of these exposures. In most settings, however, interpretation will not be this clear and should not be expected to map onto potential real-world interventions. We note that this is not a function of the quantile g-computation method, but just part of the general messiness of working with exposures mixture data.