diff --git a/vignettes/CAPR_tutorial.Rmd b/vignettes/CAPR_tutorial.Rmd index 631bbfc4..f2f4cf9a 100644 --- a/vignettes/CAPR_tutorial.Rmd +++ b/vignettes/CAPR_tutorial.Rmd @@ -15,77 +15,85 @@ knitr::opts_chunk$set( ) ``` -The purpose of `Capr` is to provide a programatic avenue (within R) for cohort creation. What separates `Capr` cohort building from what is already possible in ATLAS is: 1) working directly in R and 2) simplfying cohort development through the optional use of component parts. Component parts are what we classify as reproducible aspects of the cohort definition. For example, say we want to conceptualize a medical diagnosis that we want to deploy multiple times within the cohort; maybe once as an additional criteria and again as a nested criteria within an inclusion rule. Another example would be if we wanted to use the same primary criteria across 20 cohorts. The primary criteria component can be saved and reused in cohort development. The goal of `Capr` is to assist and simplify cohort development while maintaining downstream compatibility with the OHDSI Methods Library. By translating the foundation of cohort development established by circe-be into R, we hope that users can leverage the R environment in assisting them in efficiently developing cohorts and expand possibilities of testing phenotype sensitivity. +The purpose of `Capr` is to provide a programmatic way to create cohorts using R. What separates `Capr` cohort building from what is already possible in ATLAS is: -The objective of this vignette for `Capr` is to: 1) demonstrate creation of a cohort definition within R, 2) deploy the cohort definition using `CirceR` 3) introduce resuable component parts and show how to save and load these constructs, 4) illustrate how to import and modify existing ATLAS cohorts and 5) using cohort building calls as data in R. +1. a code based cohort creation API in R +2. ability to reuse component parts of a cohort definition + +Component parts capture elements of a cohort definition that can be reused in multiple cohort definitions. For example, suppose we want to conceptualize a medical diagnosis that we want to deploy multiple times within the cohort definition; once as an 'additional criteria' and again as a 'nested criteria' within an inclusion rule. `Capr` allows the user to create the diagnosis once and reuse it both parts of the cohort definition. Another example is using the same primary criteria across 20 cohorts. The primary criteria component can be saved and reused in cohort development. The goal of `Capr` is to assist and simplify cohort development while maintaining downstream compatibility with the OHDSI Methods Library. By translating the foundation of cohort development established by circe-be into R, we hope that users can leverage the R environment in assisting them in efficiently developing cohorts and expand possibilities of testing phenotype sensitivity. + +The objective of this vignette for `Capr` is to: + +1. demonstrate creation of a cohort definition within R, +2. deploy (i.e. serialize and generate) the cohort definition using `CirceR` +3. introduce reusable component parts and show how to save and load these constructs, +4. illustrate how to import and modify existing ATLAS cohorts and +5. use cohort building calls as data in R. ## Creating a Cohort Definition in R -### Set Up +### Set Up -Before building our cohort, we need to load some R packages to assit us and connect to an OMOP CDM. CAPR relies on accessibility to an OMOP Vocabulary schema to find concepts and leverage relationships. We hope in the future we can set a public connection to ATHENA or an OMOP Vocabulary database to expand useability. We should note to ensure that you are using the latest OMOP vocabulary when using CAPR. +Before building a cohort with Capr we need to load some R packages and connect to an OMOP CDM. Capr relies on accessibility to an OMOP Vocabulary schema to find concepts and leverage relationships. We hope in the future we can set a public connection to ATHENA or an OMOP Vocabulary database to expand usability. We should note to ensure that you are using the latest OMOP vocabulary when using CAPR. ```{r setup, eval=FALSE} -library(Capr) #the CAPR package to build cohorts in R -library(DatabaseConnector) #to connect to our OMOP CDM -library(CirceR) #access the circe engine to generate ohdisql -library(magrittr) #leverage piping +library(Capr) +library(DatabaseConnector) -connectionDetails <- createConnectionDetails(dbms="postgresql", - server="example.com/datasource", - user="me", - password="secret", - schema="cdm", - port="5432") -conn <- connect(connectionDetails) -``` +connectionDetails <- createConnectionDetails(dbms = "postgresql", + server = Sys.getenv("capr_server"), + user = Sys.getenv("capr_user"), + password = Sys.getenv("capr_password")) +connection <- connect(connectionDetails) + +vocabularyDatabaseSchema <- Sys.getenv("capr_schema") +``` ```{r invisible setup, echo=FALSE, eval=TRUE, message=FALSE} -library(Capr) #the CAPR package to build cohorts in R -library(DatabaseConnector) #to connect to our OMOP CDM -library(CirceR) #access the circe engine to generate ohdisql -library(magrittr) #leverage piping - -PcCovidDiag <- loadComponent(system.file("extdata","PcCovidDiag.json", package = "Capr")) -AcCovidDiag <- loadComponent(system.file("extdata","AcCovidDiag.json", package = "Capr")) -IrsCovidDiag <- loadComponent(system.file("extdata","IrsCovidDiag.json", package = "Capr")) -EsCovidDiag <- loadComponent(system.file("extdata","EsCovidDiag.json", package = "Capr")) -cen <- loadComponent(system.file("extdata","exampleCen.json", package = "Capr")) -CovidCountCriteria1 <- loadComponent(system.file("extdata","CovidCountCriteria1.json", package = "Capr")) -CovidDiagGroup <- loadComponent(system.file("extdata","CovidDiagGroup.json", package = "Capr")) -IpVisitQuery <- loadComponent(system.file("extdata","IpVisitQuery.json", package = "Capr")) -IpVisitCSE <- loadComponent(system.file("extdata","IpVisitCSE.json", package = "Capr")) -exampleConceptLookup <- readRDS(system.file("extdata","exampleConceptLookup.rds", package = "Capr")) +# library(Capr) #the CAPR package to build cohorts in R +# library(DatabaseConnector) #to connect to our OMOP CDM +# library(CirceR) #access the circe engine to generate ohdisql +# library(magrittr) #leverage piping +# +# PcCovidDiag <- loadComponent(system.file("extdata","PcCovidDiag.json", package = "Capr")) +# AcCovidDiag <- loadComponent(system.file("extdata","AcCovidDiag.json", package = "Capr")) +# IrsCovidDiag <- loadComponent(system.file("extdata","IrsCovidDiag.json", package = "Capr")) +# EsCovidDiag <- loadComponent(system.file("extdata","EsCovidDiag.json", package = "Capr")) +# cen <- loadComponent(system.file("extdata","exampleCen.json", package = "Capr")) +# CovidCountCriteria1 <- loadComponent(system.file("extdata","CovidCountCriteria1.json", package = "Capr")) +# CovidDiagGroup <- loadComponent(system.file("extdata","CovidDiagGroup.json", package = "Capr")) +# IpVisitQuery <- loadComponent(system.file("extdata","IpVisitQuery.json", package = "Capr")) +# IpVisitCSE <- loadComponent(system.file("extdata","IpVisitCSE.json", package = "Capr")) +# exampleConceptLookup <- readRDS(system.file("extdata","exampleConceptLookup.rds", package = "Capr")) ``` In this example we are looking to re-build a cohort developed for the COVID-19 study-a-thon (ID:1775485) This cohort was identifying all inpatient visits for COVID-19 patients. This is a description of the cohort: ------------------------------ +------------------------------------------------------------------------ **Cohort Entry Events** People may enter the cohort when observing any of the following: -1. visit occurrences of 'InpatientVisit', starting after December 1, 2019. +1. visit occurrences of 'InpatientVisit', starting after December 1, 2019. Restrict entry events to with any of the following criteria: - 1. having at least 1 condition occurrence of 'COVID-19 (including asymptomatic)', starting 21 days before cohort entry start date and starting anytime on or before cohort entry end date. - 2. having at least 1 condition occurrence of any condition (including 'COVID-19 source codes' source concepts), starting 21 days before cohort entry start date and starting anytime on or before cohort entry end date. - 3. having at least 1 measurement of 'COVID-19 specific test', starting 21 days before cohort entry start date and starting anytime on or before cohort entry end date. - 4. having at least 1 measurement of 'Covid-19 Specific Measurement', starting 21 days before cohort entry start date and starting anytime on or before cohort entry end date; with value as concept: "positive", "detected", "present", "detected", "present" or "positive". - 5. having at least 1 observation of 'Covid-19 Specific Measurement', starting 21 days before cohort entry start date and starting anytime on or before cohort entry end date; with value as concept: "positive", "detected", "present", "detected", "present" or "positive". - 6. having at least 1 observation of any observation (including 'COVID-19 source codes' source concepts), starting 21 days before cohort entry start date and starting anytime on or before cohort entry end date. - +1. having at least 1 condition occurrence of 'COVID-19 (including asymptomatic)', starting 21 days before cohort entry start date and starting anytime on or before cohort entry end date. +2. having at least 1 condition occurrence of any condition (including 'COVID-19 source codes' source concepts), starting 21 days before cohort entry start date and starting anytime on or before cohort entry end date. +3. having at least 1 measurement of 'COVID-19 specific test', starting 21 days before cohort entry start date and starting anytime on or before cohort entry end date. +4. having at least 1 measurement of 'Covid-19 Specific Measurement', starting 21 days before cohort entry start date and starting anytime on or before cohort entry end date; with value as concept: "positive", "detected", "present", "detected", "present" or "positive". +5. having at least 1 observation of 'Covid-19 Specific Measurement', starting 21 days before cohort entry start date and starting anytime on or before cohort entry end date; with value as concept: "positive", "detected", "present", "detected", "present" or "positive". +6. having at least 1 observation of any observation (including 'COVID-19 source codes' source concepts), starting 21 days before cohort entry start date and starting anytime on or before cohort entry end date. **Inclusion Criteria** -*1. >=18 years old* +*1. \>=18 years old* -Entry events with the following event criteria: who are >= 18 years old. +Entry events with the following event criteria: who are \>= 18 years old. -*2. Has >= 365 of observation* +*2. Has \>= 365 of observation* Entry events having at least 1 observation period, starting anytime up to 365 days before cohort entry start date and ending between 0 days before and all days after cohort entry end date. @@ -93,13 +101,12 @@ Entry events having at least 1 observation period, starting anytime up to 365 da Entry events having no visit occurrences of 'InpatientVisit', starting in the 180 days prior to cohort entry start date; with any of the following criteria: - 1. having at least 1 condition occurrence of 'COVID-19 (including asymptomatic)', starting 21 days before 'InpatientVisit' start date and starting anytime on or before 'InpatientVisit' end date. - 2. having at least 1 condition occurrence of any condition (including 'COVID-19 source codes' source concepts), starting 21 days before 'InpatientVisit' start date and starting anytime on or before 'InpatientVisit' end date. - 3. having at least 1 measurement of 'COVID-19 specific test', starting 21 days before 'InpatientVisit' start date and starting anytime on or before 'InpatientVisit' end date. - 4. having at least 1 measurement of 'Covid-19 Specific Measurement', starting 21 days before 'InpatientVisit' start date and starting anytime on or before 'InpatientVisit' end date; with value as concept: "positive", "detected", "present", "detected", "present" or "positive". - 5. having at least 1 observation of 'Covid-19 Specific Measurement', starting 21 days before 'InpatientVisit' start date and starting anytime on or before 'InpatientVisit' end date; with value as concept: "positive", "detected", "present", "detected", "present" or "positive". - 6. having at least 1 observation of any observation (including 'COVID-19 source codes' source concepts), starting 21 days before 'InpatientVisit' start date and starting anytime on or before 'InpatientVisit' end date. - +1. having at least 1 condition occurrence of 'COVID-19 (including asymptomatic)', starting 21 days before 'InpatientVisit' start date and starting anytime on or before 'InpatientVisit' end date. +2. having at least 1 condition occurrence of any condition (including 'COVID-19 source codes' source concepts), starting 21 days before 'InpatientVisit' start date and starting anytime on or before 'InpatientVisit' end date. +3. having at least 1 measurement of 'COVID-19 specific test', starting 21 days before 'InpatientVisit' start date and starting anytime on or before 'InpatientVisit' end date. +4. having at least 1 measurement of 'Covid-19 Specific Measurement', starting 21 days before 'InpatientVisit' start date and starting anytime on or before 'InpatientVisit' end date; with value as concept: "positive", "detected", "present", "detected", "present" or "positive". +5. having at least 1 observation of 'Covid-19 Specific Measurement', starting 21 days before 'InpatientVisit' start date and starting anytime on or before 'InpatientVisit' end date; with value as concept: "positive", "detected", "present", "detected", "present" or "positive". +6. having at least 1 observation of any observation (including 'COVID-19 source codes' source concepts), starting 21 days before 'InpatientVisit' start date and starting anytime on or before 'InpatientVisit' end date. Limit qualifying entry events to the all events per person. @@ -111,90 +118,79 @@ The cohort end date will be offset from index event's end date plus 0 days. Entry events will be combined into cohort eras if they are within 0 days of each other. ---------------------------- +------------------------------------------------------------------------ ### Looking up Concepts -The first thing to do in building a cohort is look up clinical concepts we want to use in the cohort. In the example below we want to lookup Inpatient Visits an Emergency Room and Inpatient Visit and an Inpatient Visit. With an existing database connection we can lookup some standard concept Ids in the vocabulary table. +The first thing to do in building a cohort is look up clinical concepts we want to use in the cohort. In the example below we want to lookup Inpatient Visits an Emergency Room and Inpatient Visit and an Inpatient Visit. With an existing database connection we can lookup some standard concept Ids in the vocabulary table. ```{r lookup ids, eval=FALSE} -getConceptIdDetails(conceptIds = c(262, #ERand IP Visit - 9201), #IP Visit - connectionDetails = NULL, - connection = connection, #use connection since it is already open - vocabularyDatabaseSchema = vocabularyDatabaseSchema, - oracleTempSchema = NULL, - mapToStandard = TRUE) +# ConceptId 262 = Emergency Room and Inpaitent Visit +# Concept Id 9201 = Inpatient visit +getConceptIdDetails(conceptIds = c(262, 9201), + connection = connection, + vocabularyDatabaseSchema = vocabularyDatabaseSchema, + mapToStandard = TRUE) ``` ```{r echo = FALSE, eval=TRUE} -exampleConceptLookup[[1]] +# exampleConceptLookup[[1]] ``` - Say we do not have a standard OMOP concept id that we want to query but a code from a medical vocabulary like ICD10CM. We can look up this concept code and find the non-standard concept ID. ```{r lookup codes1, eval=FALSE} getConceptCodeDetails(conceptCode = "E11", - vocabulary = "ICD10CM", - connectionDetails = NULL, - connection = connection, #use connection since it is already open - vocabularyDatabaseSchema = vocabularyDatabaseSchema, - oracleTempSchema = NULL, - mapToStandard = FALSE) + vocabulary = "ICD10CM", + connection = connection, + vocabularyDatabaseSchema = vocabularyDatabaseSchema, + mapToStandard = FALSE) ``` ```{r echo = FALSE, eval=TRUE} -exampleConceptLookup[[2]] +# exampleConceptLookup[[2]] ``` -If we only knew the concept code for ICD10CM but wanted it mapped to a standard concept, we can tell our function to `mapToStandard`. In this case we are returned the standard SNOMED concept. +If we only know the ICD10CM code but want the associated standard concept we can tell our function to `mapToStandard`. In this case the standard SNOMED concept is returned. ```{r lookup codes2} getConceptCodeDetails(conceptCode = "E11", - vocabulary = "ICD10CM", - connectionDetails = NULL, - connection = connection, #use connection since it is already open - vocabularyDatabaseSchema = vocabularyDatabaseSchema, - oracleTempSchema = NULL, - mapToStandard = TRUE) + vocabulary = "ICD10CM", + connection = connection, + vocabularyDatabaseSchema = vocabularyDatabaseSchema, + mapToStandard = TRUE) ``` ```{r echo = FALSE, eval=TRUE} -exampleConceptLookup[[3]] +# exampleConceptLookup[[3]] ``` -Finally if we only know the medical concept but not a specific code, we can do a keyword search to find any concepts that contain a character string of "Diabetes". In the code chunk below we only return the first 5 rows for demonstration purposes. +Finally if we only know the medical concept but not a specific code we can do a keyword search to find any concepts that contain a character string of "Diabetes". ```{r lookup keywords, warning=FALSE} Diabetes <- lookupKeyword(keyword = "Diabetes", searchType = "any", - connectionDetails = NULL, - connection = connection, #use connection since it is already open - vocabularyDatabaseSchema = vocabularyDatabaseSchema, - oracleTempSchema = NULL) -Diabetes$.[1:5,] #need dot to call data frame from object -``` + connection = connection, + vocabularyDatabaseSchema = vocabularyDatabaseSchema) +head(Diabetes) +``` ```{r echo = FALSE, eval=TRUE} -diabetes <- exampleConceptLookup[[4]] -diabetes$CONCEPT_NAME <- substr(diabetes$CONCEPT_NAME, 1, 35) -diabetes +# diabetes <- exampleConceptLookup[[4]] +# diabetes$CONCEPT_NAME <- substr(diabetes$CONCEPT_NAME, 1, 35) +# diabetes ``` ### Working with Concept Set Expressions -Next we want to take this concept set and turn it into a concept set expression. The concept set expression contains the concept set item which sets how we want to use the concept. We can find descendant concept, exclude it from the list or include its mapped concepts to the expression. In `Capr` when a concept set expression is created the default is to include descendants. When we create the concept set expression we generate a global unique identifier for this component. In the OMOP cohort concept sets are referred to internally using an integer starting from 0. This identifier deploys the concept set expression within a component. With `Capr` we want these pieces to exist in isolation outside the cohort definition, so we dont have to rely on this arbitrary number within the cohort definition. This allows us to repurpose any concept set expression and parent component for a new cohort. In the structural print we see the guid in the concept set expression section of the component part. +Next we want to take this concept set and turn it into a concept set expression. The concept set expression contains the concept set item which sets how we want to use the concept. We can find descendant concept, exclude it from the list or include its mapped concepts to the expression. In `Capr` when a concept set expression is created the default is to include descendants. When we create the concept set expression we generate a global unique identifier for this component. In the OMOP cohort concept sets are referred to internally using an integer starting from 0. This identifier deploys the concept set expression within a component. With `Capr` we want these pieces to exist in isolation outside the cohort definition so we don't have to rely on this arbitrary number within the cohort definition. This allows us to re-purpose any concept set expression and parent component for a new cohort. In the structural print we see the guid in the concept set expression section of the component part. ```{r cse} -IpVisitCSE <- getConceptIdDetails(conceptIds = c(262, #ERand IP Visit - 9201), #IP Visit - connectionDetails = NULL, - connection = connection, #use connection since it is already open +IpVisitCSE <- getConceptIdDetails(conceptIds = c(262, 9201), + connection = connection, vocabularyDatabaseSchema = vocabularyDatabaseSchema, - oracleTempSchema = NULL, mapToStandard = TRUE) %>% createConceptSetExpression(Name = "InpatientVisit", includeDescendants = TRUE) @@ -204,24 +200,26 @@ IpVisitCSE <- getConceptIdDetails(conceptIds = c(262, #ERand IP Visit str(IpVisitCSE) ``` +Returning to the idea of concept set items, we can customize the mapping of the concept set item. If we had three concepts in the set but only want to exclude one of them we can specify the position in the `createConceptMapping` function. -Returning to the idea of concept set items, we can customize the mapping of the concept set item. If we had three concepts in the set but only want to exclude one of them we can specify the position in the `createConceptMapping` function. +QUESTION: Could this use case be handled by createConceptSetExpression. ```{r custom concept set items} + cid <- c(37310282L, 37310281L, 756055L) nm <- "COVID-19 specific testing (pre-coordinated Measurements excluded)" n <- length(cid) -#lookup up covid19 meausres + +# lookup up covid19 meausres MeasureOfCovid19 <- getConceptIdDetails(conceptIds = cid, - connectionDetails = NULL, - connection = connection, #use connection since it is already open - vocabularyDatabaseSchema = vocabularyDatabaseSchema, - oracleTempSchema = NULL, - mapToStandard = FALSE) -#create a custom mapping list + connection = connection, + vocabularyDatabaseSchema = vocabularyDatabaseSchema, + mapToStandard = FALSE) + +# create a custom mapping list conceptMapping <- createConceptMapping(n = n, - includeDescendants = rep(TRUE,n), - isExcluded = c(TRUE,TRUE,FALSE)) + includeDescendants = rep(TRUE, n), + isExcluded = c(TRUE, TRUE, FALSE)) MeasureOfCovid19CSE <- MeasureOfCovid19 %>% createConceptSetExpressionCustom(Name = nm, conceptMapping = conceptMapping) @@ -229,24 +227,38 @@ MeasureOfCovid19CSE <- MeasureOfCovid19 %>% ### Building a Query -Queries lookup a concept set in a domain table in the cdm a returns the set of persons in the table that satisfy this condition. A query contains a concept set expression and an attribute, which can be NULL. In this example we create an occurrence start date attribute. The create attribute signatures start with create and end with attribute. Sandwhiched in the middle is the name of the attribute we wish to create. We can look up attributes using the command `listAttributeOptions`. In this example we start by building an occurrence start date attribute. We want the occurrence of the visit to take place any time after December 1st 2019. +Queries look up a concept set in a domain table in the CDM a returns the set of persons in the table that satisfy this condition. A query contains a concept set expression and a list of one or more attributes, which can be NULL. In this example we create an occurrence start date attribute. The create attribute function names are of the form `create_____Attribute()` where the name of the attribute we wish to create is in the blank space. We can look up attributes using the command `listAttributeOptions`. In this example we start by building an occurrence start date attribute. We want the occurrence of the visit to take place any time after December 1st 2019. ```{r create an attribute} DateAtt <- createOccurrenceStartDateAttribute(Op = "gt", Value = "2019-12-01") ``` -Next we create a visit occurrence query for the inpatient concept set expression and date attribute. Notice at every point we are building up our component container. +```{r, include=FALSE, eval=FALSE} +listAttributeOptions() + +``` + +Next we create a visit occurrence query for the inpatient concept set expression and date attribute. Notice at every point we are building up our component container. ```{r create a query} IpVisitQuery <- createVisitOccurrence(conceptSetExpression = IpVisitCSE, attributeList = list(DateAtt)) + +``` + +```{r, include=FALSE, eval=FALSE} +# An idea for the API +# queryVisit(IpVisit, +# First(), +# OccurenceStartDate("gt", "2019-12-01"), +# VisitType()) + ``` ```{r echo=FALSE, eval=TRUE} str(IpVisitQuery) ``` - This query is what we use for the primary criteria of this cohort. All that remains, is adding an observation window and a limit (number of observations per person) to set the cohort entry. Notice again in the structural print the component inherits the query and evolves into a primary criteria component class. If one ever wants to check the component class they can use `componentType(pc)`. ```{r create a primary criteria} @@ -256,54 +268,64 @@ PcCovidDiag <- createPrimaryCriteria(Name = "Inpatient Visit Primary Criteria", Limit = "All") ``` - ```{r echo=FALSE, eval=TRUE} str(PcCovidDiag) ``` - ### Creating a Count -Count classes inherit queries and provide boolean logic and temporality. This class counts the number of occurrences of a query that fit within an observation period relative to the initial event. Counts are used in cohort restriction like in the additional criteria and inclusion rules. In this example we are going to build a count for COVID-19. We first look up the concept and include descendants in the mapping. Next we make the concept set expression into a query without any attributes. A count requires a timeline so we can create a time window relative to the initial evemt. +Count classes inherit from queries and add Boolean logic and temporal constraints. This class counts the number of occurrences of a query that fit within an observation period relative to the initial event. Counts are used in cohort restriction like in the additional criteria and inclusion rules. In this example we are going to build a count for COVID-19. We first look up the concept and include descendants in the mapping. Next we make the concept set expression into a query without any attributes. A count requires a timeline so we can create a time window relative to the initial event. -```{r create a count} +```{r} #lookup covid diagnosis in vocabulary CovidDiag <- getConceptIdDetails(conceptIds = 37311061, - connectionDetails = NULL, - connection = connection, #use connection since it is already open + connection = connection, vocabularyDatabaseSchema = vocabularyDatabaseSchema, - oracleTempSchema = NULL, mapToStandard = TRUE) + #create concept set expression including descendant concepts CovidDiagCSE <- CovidDiag %>% - createConceptSetExpression(Name="COVID-19 (including asymptomatic)", + createConceptSetExpression(Name = "COVID-19 (including asymptomatic)", includeDescendants = TRUE) -#create a conditionOccurrence query + +# create a conditionOccurrence query CovidDiagQuery <- createConditionOccurrence(conceptSetExpression = CovidDiagCSE) #create start window for timeline #start window to count occurrences from inpatient event (cohort entry) -StartWindow1 <- createWindow(StartDays = 21, StartCoeff = "Before", # 21 days before IpVisit index start - EndDays = "All",EndCoeff = "After") #to all days after IpVisit index start +# 21 days before IpVisit index start to all days after IpVisit index start +StartWindow1 <- createWindow(StartDays = 21, StartCoeff = "Before", + EndDays = "All", EndCoeff = "After") + #create end window for timeline #end window to count occurrences from inpatient event (cohort entry) -EndWindow1 <- createWindow(StartDays = "All", StartCoeff = "Before", #end all days before IpVisit index start - EndDays = 0, EndCoeff = "After", #0 days after IpVisit index start +#end all days before IpVisit index start +#0 days after IpVisit index start +EndWindow1 <- createWindow(StartDays = "All", StartCoeff = "Before", + EndDays = 0, EndCoeff = "After", IndexStart = FALSE) #toggle index end + #create timeline Timeline1 <- createTimeline(StartWindow = StartWindow1, - EndWindow = EndWindow1) + EndWindow = EndWindow1) + #create a count criteria #at least 1 occurrence of a covid diagnosis occurring #between a) 21 days before and all days after initial inpatient visit index start date #and b) all days before and 0 days after initial inpatient visit index end date CovidCountCriteria1 <- createCount(Query = CovidDiagQuery, - Logic = "at_least", - Count = 1, - Timeline = Timeline1) - + Logic = "at_least", + Count = 1, + Timeline = Timeline1) + ``` +```{r, eval=FALSE, include=FALSE} +# api idea +Count(Visit(CovidDiag), + atLeast(1), + between(before(21), after("all"))) +``` ```{r echo=FALSE, eval=TRUE} str(CovidCountCriteria1) @@ -311,25 +333,23 @@ str(CovidCountCriteria1) ### Creating a Group -A group is a set of counts, queries, attributes and sub-groups that are deployed using boolean logic. We can combine multiple components and then describe if the patient needs to satisfy all these criteria to remain in the cohort or at least 1. A group is used for additional criteria and inclusion rules. It is also used for a nested criteria attribute. In this example we will construct multiple counts that comprise of a group. Within this section we also show sub-examples of nuiances of `Capr` as we build up to the group. - +A group is a set of counts, queries, attributes and sub-groups that are deployed using boolean logic. We can combine multiple components and then describe if the patient needs to satisfy all these criteria to remain in the cohort or at least 1. A group is used for additional criteria and inclusion rules. It is also used for a nested criteria attribute. In this example we will construct multiple counts that comprise of a group. Within this section we also show sub-examples of nuances of `Capr` as we build up to the group. #### Creating a Source Concept Attribute -In the code chunk below we want to use source concepts as an attribute for one of the criteria in the group. The patient must have at least 1 instance of these source concepts, excluding for 45542411, 586414, 45600471, 586415, 710157. We first creae a concept mapping for the length of the concept set and then toggle the mapping so that these five concepts have the item of isExcluded set to TRUE. Once the conceptMapping is set we can create the concept set Expression. The concept set expression known as COVID SourceConcept can be used as a source concept attribute. Notice when we make the query in the `createConditionOccurrence` function we do not include the concept set expression like before. This is because we are making a special attribute for the source concepts. We should note that a query does not always contain a concept set expression. This is also the case when making an observation period. One should be careful on how they wish to apply the concept set expressions within the query, it may be an attribute. +In the code chunk below we want to use source concepts as an attribute for one of the criteria in the group. The patient must have at least 1 instance of these source concepts, excluding for 45542411, 586414, 45600471, 586415, 710157. We first create a concept mapping for the length of the concept set and then toggle the mapping so that these five concepts have the item of isExcluded set to TRUE. Once the conceptMapping is set we can create the concept set Expression. The concept set expression known as COVID SourceConcept can be used as a source concept attribute. Notice when we make the query in the `createConditionOccurrence` function we do not include the concept set expression like before. This is because we are making a special attribute for the source concepts. We should note that a query does not always contain a concept set expression. This is also the case when making an observation period. One should be careful on how they wish to apply the concept set expressions within the query, it may be an attribute. ```{r create source concept criteria} CovidSourceConcepts <- getConceptIdDetails(conceptIds = c(710158L, 710155L, 710156L, 710159L, 45542411L, 710160L, 45756093L, 42501115L, 586414L, 45600471L, 586415L, 710157L), - connectionDetails = NULL, - connection = connection, #use connection since it is already open + connection = connection, vocabularyDatabaseSchema = vocabularyDatabaseSchema, - oracleTempSchema = NULL, mapToStandard = FALSE) #create a default mapping list includeDescendants, isExcluded, includeMapped all FALSE ConceptMappingSourceConcepts <- createConceptMapping(n = nrow(CovidSourceConcepts)) + #toggle TRUE the positions needed to alter ConceptMappingSourceConcepts <- toggleConceptMapping(conceptMapping = ConceptMappingSourceConcepts, pos = c(5,9:12), mapping = "isExcluded") @@ -356,7 +376,6 @@ CovidCountCriteria2 <- createCount(Query = CovidSourceConceptQuery, Something you probably have noticed is most of `Capr` code contains forward pipes from the `magrittr` package in R. This usually looks a little nicer to avoid overuse of the assignment operator `<-`. However we use this deliberately to convey the mutability of the component container in `Capr`. Looking back at the structural prints from before we can see that component class changes as the object moves from a concept set expression, to a query, to a count and so forth. However the information from the previous state stays the same. The S4 component allows us to modify the structure while maintaining some rigid consistency across inheritance. In the code below, our look up returns a data.frame. The input for the `createConceptSetExpression` is a data.frame and the output is component object with a component class of conceptSetExpression. Next we mutate this component to build a query. On can always check the current component class of a component by using the function `componentType` - ```{r using piping} CovidCountCriteria3 <- getConceptIdDetails(conceptIds = 37310282, connectionDetails = NULL, @@ -374,23 +393,20 @@ componentType(CovidCountCriteria3) #### Creating a Concept Attribute -Sometimes we want to apply concepts directly into a cohort as an attribute, without building them as a concept set expression in the cohort. In this case we are using different values as concepts for measurement of a COVID-19 test. Below we show a lookup as before for concept values. +Sometimes we want to apply concepts directly into a cohort as an attribute, without building them as a concept set expression in the cohort. In this case we are using different values as concepts for measurement of a COVID-19 test. Below we show a lookup as before for concept values. ```{r} getConceptIdDetails(conceptIds = c(4126681L,45877985L, 9191L, 4181412L, 45879438L, 45884084L), - connectionDetails = NULL, - connection = connection, #use connection since it is already open - vocabularyDatabaseSchema = vocabularyDatabaseSchema, - oracleTempSchema = NULL, - mapToStandard = TRUE) + connection = connection, + vocabularyDatabaseSchema = vocabularyDatabaseSchema, + mapToStandard = TRUE) ``` - ```{r echo=FALSE, eval=TRUE} -exampleConceptLookup[[5]] +# exampleConceptLookup[[5]] ``` -If we wanted to use these concepts directly into an attribute we can search them from the function call. Underlying this function is the same lookup command as before in addition to some formatting changes for concepts that are incorporated within the cohort and not in the concept set list. +If we wanted to use these concepts directly into an attribute we can search them from the function call. Underlying this function is the same lookup command as before in addition to some formatting changes for concepts that are incorporated within the cohort and not in the concept set list. ```{r create a concept attribute} ValueAsConceptAtt <- createValueAsConceptAttribute(conceptIds = c(4126681L,45877985L, 9191L, @@ -408,9 +424,9 @@ CovidCountCriteria4 <- createCount(Query = Covid19MeasuresQuery, Count=1, Timeline = Timeline1) ``` -#### Resuing Component Parts +#### Reusing Component Parts -`Capr` levarages the global environment in R to maintain objects that have been previously created in other parts of the cohort. In `Capr`, components and other objects can always exist in isolation so we can simply add them into a different function call. Notice that our object `Timeline1` which represents the timeline of the restricted events has stayed the same for each new count. We are recycling the object from the global environment and do not need to recreate this every time. We will expand upon this idea in the save and load commands in `Capr`. Beyond reuse we can copy and modify aspects to generate new component objects. In the code chunk below we can build this count through our typical means: lookup the concept, create the mapping, make the concept set expression, set the query with attributes and then build the count. +`Capr` leverages the global environment in R to maintain objects that have been previously created in other parts of the cohort. In `Capr`, components and other objects can always exist in isolation so we can simply add them into a different function call. Notice that our object `Timeline1` which represents the timeline of the restricted events has stayed the same for each new count. We are recycling the object from the global environment and do not need to recreate this every time. We will expand upon this idea in the save and load commands in `Capr`. Beyond reuse we can copy and modify aspects to generate new component objects. In the code chunk below we can build this count through our typical means: lookup the concept, create the mapping, make the concept set expression, set the query with attributes and then build the count. ```{r reusing a concept set for different domain} Covid19ObservationMeasureQuery <- createObservation(conceptSetExpression = MeasureOfCovid19CSE, @@ -423,16 +439,16 @@ CovidCountCriteria5A <- createCount(Query = Covid19ObservationMeasureQuery, ``` -This is nearly identical to something we have created before, except it was a Measurment domain. Instead of recreating everything we can use the copy and modify principles in R to more quickly change this component to the new domain observation. +This is nearly identical to something we have created before, except it was a Measurement domain. Instead of recreating everything we can use the copy and modify principles in R to more quickly change this component to the new domain observation. ```{r} CovidCountCriteria5 <- CovidCountCriteria4 CovidCountCriteria5@CriteriaExpression[[1]]@Criteria@Domain <- "Observation" ``` -#### Finally, Creating the group +#### Finally, Creating the group -We need to make one more count for our group. Once this is created we can add all these components into a group. A group requires a name, some logic on what counts and the list of criterias, demographic criterias and sub groups. +We need to make one more count for our group. Once this is created we can add all these components into a group. A group requires a name, some logic on what counts and the list of criteria, demographic criteria and sub groups. ```{r creating the group} CovidSourceConceptQueryObs <- createObservation(attributeList = list( @@ -452,7 +468,7 @@ saveComponent(CovidDiagGroup, savePath = "~/Documents") ``` -We can use this group to make an additional criteria like in the chunk below. +We can use this group to make an additional criteria like in the chunk below. ```{r creating an additional criteria} AcCovidDiag <- createAdditionalCriteria(Name = "Additional Crit for COVID cohort", @@ -460,7 +476,7 @@ AcCovidDiag <- createAdditionalCriteria(Name = "Additional Crit for COVID cohort Limit = "All") ``` -We can also use this same group as a nested (correlated) criteria for an inclusion rule. This is another example of the resuability of the components. We can make a group once and then use it in a variety of scenarios across the cohort definition. We can also save the group and use it another time in a different cohort if we plan to use it frequently across a family of like cohorts. +We can also use this same group as a nested (correlated) criteria for an inclusion rule. This is another example of the resusbility of the components. We can make a group once and then use it in a variety of scenarios across the cohort definition. We can also save the group and use it another time in a different cohort if we plan to use it frequently across a family of like cohorts. ```{r creating a correlated criteria} #load an existing component @@ -482,10 +498,9 @@ CovidHospitalizationGroup <- createGroup(Name ="does not have hospitalization fo Groups = NULL) ``` - #### Creating a Demographic Criteria -In our groups sometimes we want to include a demographic criteria for examples persons 18 years or older. Demographic criterias are attributes, so we use the attribute functions to create them like before. +In our groups sometimes we want to include a demographic criteria for examples persons 18 years or older. Demographic criteria are attributes, so we use the attribute functions to create them like before. ```{r creating demographic criteria} AgeAtt <- createAgeAttribute(Op = "gte", Value = 18) @@ -498,7 +513,7 @@ Age18AndOlderGroup <- createGroup(Name = ">=18 years old", ### Creating Inclusion Rules -Once we have a series of groups we are happy with we can bundle them together to create inclusion rules. The code chunk below shows how to make inclusion rules from the groups we have created before plus a group that is based on persons having at least 365 days of observation. +Once we have a series of groups we are happy with we can bundle them together to create inclusion rules. The code chunk below shows how to make inclusion rules from the groups we have created before plus a group that is based on persons having at least 365 days of observation. ```{r creating inclusion rules} Timeline3 <- createTimeline(StartWindow = createWindow(StartDays = "All", StartCoeff = "Before", @@ -522,7 +537,6 @@ IrsCovidDiag <- createInclusionRules(Name = "Inclusion Rules for covid Cohort", Limit ="First") ``` - ### Defining the cohort exit The cohort exit is defined by the end strategy and the censoring criteria. The end strategy by default is end of continuous observation, however we can create two additional strategies to define how persons exit the cohort. First we can use a date offset. The code chunk below shows an example. @@ -531,7 +545,7 @@ The cohort exit is defined by the end strategy and the censoring criteria. The e EsCovidDiag <- createDateOffsetEndStrategy(offset = 0, eventDateOffset = "EndDate") ``` -We can aslo create a custom era using a drug exposure to define the end point of a cohort. This example is not a part of the covid cohort but is included for demonstration. We first find a concept like warfarin, create the concept set expression and then apply this within the custom era. +We can aslo create a custom era using a drug exposure to define the end point of a cohort. This example is not a part of the covid cohort but is included for demonstration. We first find a concept like warfarin, create the concept set expression and then apply this within the custom era. ```{r creating a custom era} WarfarinCSE <- getConceptIdDetails(conceptIds = 1310149, @@ -547,7 +561,7 @@ EndStrategyWithDrug <- createCustomEraEndStrategy(WarfarinCSE, offset = 0) ``` -Aside from the end strategy we can also define a censoring criteria as a way that persons exit the cohort. This is another example that is not part of the covid cohort but we include for the purpose of demonstration. Similar to a primary criteria we can build a concept set expression for what will be our censoring event. Then we create a query (we may also add an attribute) that defines the domain of the concept set expression. We can add any queries we wish to add to the censoring criteria to build this component. +Aside from the end strategy we can also define a censoring criteria as a way that persons exit the cohort. This is another example that is not part of the covid cohort but we include for the purpose of demonstration. Similar to a primary criteria we can build a concept set expression for what will be our censoring event. Then we create a query (we may also add an attribute) that defines the domain of the concept set expression. We can add any queries we wish to add to the censoring criteria to build this component. ```{r creating a censoring criteria} InfliximabCSE <- getConceptIdDetails(conceptIds = c(937368,937369), @@ -563,7 +577,7 @@ cen <-createCensoringCriteria(Name = "prev inflix", ComponentList = list(inflix ### Creating a Cohort Era -The last piece of a cohort definition is to add the cohort era. This defines padding between events and censor window of when to begin recording events. Say in a cohort we only we want to look at events after a certain date. All of this is defined in the cohort era. In many cases the cohort era can be composed of default settings (padding of 0 and no censor window) and does not need to to built for the cohort definition. +The last piece of a cohort definition is to add the cohort era. This defines padding between events and censor window of when to begin recording events. Say in a cohort we only we want to look at events after a certain date. All of this is defined in the cohort era. In many cases the cohort era can be composed of default settings (padding of 0 and no censor window) and does not need to to built for the cohort definition. ```{r creating a cohort Era} cohortEra <- createCohortEra(LeftCensorDate = "2019-12-17") @@ -571,7 +585,7 @@ cohortEra <- createCohortEra(LeftCensorDate = "2019-12-17") ### Creating the cohort definition -Once we are happy with the creation of all the sub-components we can build the cohort definition. This is pretty simple. We infuse the component parts necessary for the cohort definition and create. In the cohort definition we can add meta data about the cohort definition. This includes a name (required), description (optional), author (optional) and the cdm version compatability. The default of the cdm version is ">= 5.0.0". If we had a different variation of this cohort definition, for example the primary criteria for inpatient has a different observation window, we can simply create a separate primary criteria object and then infuse that into a separate cohort definition. +Once we are happy with the creation of all the sub-components we can build the cohort definition. This is pretty simple. We infuse the component parts necessary for the cohort definition and create. In the cohort definition we can add meta data about the cohort definition. This includes a name (required), description (optional), author (optional) and the cdm version compatability. The default of the cdm version is ">= 5.0.0". If we had a different variation of this cohort definition, for example the primary criteria for inpatient has a different observation window, we can simply create a separate primary criteria object and then infuse that into a separate cohort definition. ```{r create the cohort definition} desc <- "This cohort counts the number of inpatient visits from patients with COVID-19, this is counted as a covid diagnosis or from a measured covid test" @@ -583,7 +597,6 @@ cd <- createCohortDefinition(Name = "Inpatient COVID-19 Diag", EndStrategy = EsCovidDiag) ``` - ## Deploying the Cohort Once the cohort definition has been created we want to run it against our OMOP cdm. To do so we need to use the `CirceR` package to take the cohort definition and create the ohdisql to run on our local environment. `CirceR` will populate the parameterized sql if one sets the generate options. We need to tell what is the column for the cohort Id in the cohort table of the results schema, the id number in the results schema, the cdm schema, the table where the cohort is to be written, the results Schema where the table sits, the vocabulary schema to access concepts and a toggle to generate statistics on the cohort. We can take the cohort definition created in R and return objects to deploy this cohort across our OMOP system. The function `compileCohortDefinition` takes the R and builds the json that `CirceR` takes to get the cohort description and most importantly the ohdisql. `Capr` does some low level manipulation to get from the R object to the json and then relies on `CirceR` to build the ohdisql. Functions leveraged in the `compileCohortDefinition` functions from `CirceR` include `cohortExpressionFromJson` which activates the circe-be, `cohortPrintFriendly` which returns the enlish text of the cohort, and `buildCohortQuery` which creates the ohdisql from the input generate options. @@ -591,30 +604,46 @@ Once the cohort definition has been created we want to run it against our OMOP c ```{r compile Cohort Definition} genOp <- CirceR::createGenerateOptions(cohortIdFieldName = "cohort_definition_id", cohortId = 9999, - cdmSchema = connectionDetails$schema, + cdmSchema = vocabularyDatabaseSchema, targetTable = "cohort", resultSchema = "results", - vocabularySchema = connectionDetails$schema, + vocabularySchema = vocabularyDatabaseSchema, generateStats = F) cohortInfo <- compileCohortDefinition(cd, genOp) ``` -The cohortInfo object in the chunk above is a list of 4 aspects: -1) the s3 cohort definition (`circeS3`) -- this is a intermediate structure for cohort definition that one can browse to debug errors -2) the cohort json (`circeJson`) -- this returns the json of the cohort definition, identical to ATLAS. You can copy an paste this json to ATLAS to generate the same cohort. You can also save this for record keeping -3) the cohort description (`cohortRead`) -- this returns the english text of the cohort -4) the sql (`ohdisql`) -- the parameterized sql already filled in with the generate options. +The cohortInfo object in the chunk above is a list of 4 aspects: 1) the s3 cohort definition (`circeS3`) -- this is a intermediate structure for cohort definition that one can browse to debug errors 2) the cohort json (`circeJson`) -- this returns the json of the cohort definition, identical to ATLAS. You can copy an paste this json to ATLAS to generate the same cohort. You can also save this for record keeping 3) the cohort description (`cohortRead`) -- this returns the english text of the cohort 4) the sql (`ohdisql`) -- the parameterized sql already filled in with the generate options. -To execute the ohdisql one needs to use the `DatabaseConnector::executeSql` function to run the sql against the db as shown in the chunk before. +To execute the ohdisql one needs to use the `DatabaseConnector::executeSql` function to run the sql against the db as shown in the chunk before. ```{r eval=FALSE} -DatabaseConnector::executeSql(connection = conn, sql=cohortInfo$ohdiSQL) -``` +# TODO Need to improve sql generation. I'm not sure parameters like schema names should be automatically filled in. Also this should work nicely with the cohortGenerator package. +sql <- cohortInfo$ohdiSQL +# There are no parameters to be filled in +# What if I want to use a temp emulation schema or execute on a different database? + +# cat(sql) +# stringr::str_extract_all(sql, "@\\w+") + +# but the sql does need to be rendered because there is branching logic that needs to be normalized + +sql <- SqlRender::render(sql) +sql <- SqlRender::translate(sql, "postgresql") +# readr::write_lines(sql, here::here("sql.txt")) + +# dbExecute(connection, "ROLLBACK;") +# dbExecute(connection, "DROP TABLE codesets;") +# dbExecute(connection, "DROP TABLE qualified_events;") +# dbExecute(connection, "DROP TABLE inclusion_events;") + +DatabaseConnector::executeSql(connection = connection, sql = sql) + +``` ## Saving and Loading Cohorts -An important feature of `Capr` is the ability to save and load components. As mentioned before we may want to reuse a component in a different cohort. Instead of rebuilding this object we can save the component and load it up to deploy on a different cohort. The idea here is to leverage the programatic environment of R to assist in cohort development and programming network studies. We encourage users to save and recycle components to assist them in study development. The code chunk below shows an example of saving a cohort additional criteria. +An important feature of `Capr` is the ability to save and load components. As mentioned before we may want to reuse a component in a different cohort. Instead of rebuilding this object we can save the component and load it up to deploy on a different cohort. The idea here is to leverage the programatic environment of R to assist in cohort development and programming network studies. We encourage users to save and recycle components to assist them in study development. The code chunk below shows an example of saving a cohort additional criteria. ```{r save, eval=FALSE} saveComponent(AcCovidDiag, @@ -622,7 +651,7 @@ saveComponent(AcCovidDiag, savePath = "~/Documents/ComponentLibrary/CovidInpatient") ``` -We can then load this additional criteria into the global environment to utilize in a new iteration of a covid cohort. Maybe this new cohort has a different primary criteria but the same in other places. Notice the saved components are written to json. Future releases may allow saving the S4 object as a binary, depending on user preference. +We can then load this additional criteria into the global environment to utilize in a new iteration of a covid cohort. Maybe this new cohort has a different primary criteria but the same in other places. Notice the saved components are written to json. Future releases may allow saving the S4 object as a binary, depending on user preference. ```{r load} acCovidIp <- loadComponent("~/Documents/ComponentLibrary/CovidInpatient/covidAC.json") @@ -635,11 +664,9 @@ cd2 <- createCohortDefinition(Name = "DifferentcovidDiagCohort", EndStrategy = es) ``` - - ## Importing existing JSON Cohorts -The OHDSI community has created thousands of cohorts and has introduced a gold standard pheontype library, maintaining a list of ideal cohorts for medical constructs. We do not want to rewrite good existing cohorts. We also may want to take an existing cohort skeleton and make a slight modification. `Capr` allows us to import Circe Json and build the cohort definition to your global environment. With this cohort uploaded you can modify as you see fit. In this example we upload the cohort definition and a saved primary criteria. We modify the primary criteria to have an observation window of post days 365 and add this to our loaded cohort. We can build the ohdisql of this cohort quite quickly. +The OHDSI community has created thousands of cohorts and has introduced a gold standard pheontype library, maintaining a list of ideal cohorts for medical constructs. We do not want to rewrite good existing cohorts. We also may want to take an existing cohort skeleton and make a slight modification. `Capr` allows us to import Circe Json and build the cohort definition to your global environment. With this cohort uploaded you can modify as you see fit. In this example we upload the cohort definition and a saved primary criteria. We modify the primary criteria to have an observation window of post days 365 and add this to our loaded cohort. We can build the ohdisql of this cohort quite quickly. ```{r import json cohort} cd2 <- readInCirce("~/Documents/NetworkStudy/COVID/inpatientCovid.json") @@ -651,7 +678,7 @@ cohortInfo2 <- compileCohortDefinition(cd2, genOp) ## Cohort Building Language as Data -A cool feature of R is its ability to do metaprogramming. We can save code as data and modify the code programatically. For those interested on learning the theoretical aspects of this in R, one should read Hadley Wickhams's Advanced R section 4 about Metaprogramming. `Capr` uses this feature when importing a CIRCE json. It creates structural calls in an execution environment and then evaluates them into the global environment, leaving only the cohort definition as its trace. `Capr` has a function that places the function calls into a txt file. +A cool feature of R is its ability to do metaprogramming. We can save code as data and modify the code programatically. For those interested on learning the theoretical aspects of this in R, one should read Hadley Wickhams's Advanced R section 4 about Metaprogramming. `Capr` uses this feature when importing a CIRCE json. It creates structural calls in an execution environment and then evaluates them into the global environment, leaving only the cohort definition as its trace. `Capr` has a function that places the function calls into a txt file. ```{r} writeCaprCall(jsonPath = "~/Documents/NetworkStudy/COVID/inpatientCovid.json",