Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

computeStandardizedDifference does not handle temporal covariate data #225

Open
gowthamrao opened this issue Feb 1, 2024 · 3 comments · May be fixed by #228
Open

computeStandardizedDifference does not handle temporal covariate data #225

gowthamrao opened this issue Feb 1, 2024 · 3 comments · May be fixed by #228
Labels
Milestone

Comments

@gowthamrao
Copy link
Member

The covariates is covariateData object from temporalAnalysis settings has timeId. computeStandardizedDifference appears to not know that. it only joins by covariateId, instead of covariateId, timeId. This causes a cartesian product.

gowthamrao added a commit to gowthamrao/FeatureExtraction that referenced this issue Feb 1, 2024
gowthamrao added a commit to gowthamrao/FeatureExtraction that referenced this issue Feb 1, 2024
gowthamrao added a commit to gowthamrao/FeatureExtraction that referenced this issue Feb 1, 2024
@anthonysena anthonysena modified the milestones: V3.4.0, V3.5.0 Feb 1, 2024
@anthonysena anthonysena added the bug label Feb 2, 2024
@gowthamrao gowthamrao linked a pull request Feb 5, 2024 that will close this issue
gowthamrao added a commit to gowthamrao/FeatureExtraction that referenced this issue Feb 5, 2024
@gowthamrao
Copy link
Member Author

I have refactored the original code. Please feel free to use it

gowthamrao@1ad4513

@anthonysena
Copy link
Collaborator

Adding a reprex as suggested by @ginberg to illustrate the problem @gowthamrao has described:

packageVersion("FeatureExtraction")
#> [1] '3.4.0'

# 4283893204 = condition_era group: Sinusitis
temporalCovariateSettings <- FeatureExtraction::createTemporalCovariateSettings(
  useConditionEraGroupOverlap = TRUE,
  temporalStartDays = c(-365, -364),
  temporalEndDays = c(-365, -364),
  includedCovariateConceptIds = 4283893 
)

# Execute the analysis on Eunomia
connectionDetails <- Eunomia::getEunomiaConnectionDetails()
Eunomia::createCohorts(
  connectionDetails = connectionDetails
)
#> Connecting using SQLite driver
#> Creating cohort: Celecoxib
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Executing SQL took 0.026 secs
#> Creating cohort: Diclofenac
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Executing SQL took 0.0169 secs
#> Creating cohort: GiBleed
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Executing SQL took 0.0279 secs
#> Creating cohort: NSAIDs
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Executing SQL took 0.0735 secs
#> Cohorts created in table main.cohort
#>   cohortId       name
#> 1        1  Celecoxib
#> 2        2 Diclofenac
#> 3        3    GiBleed
#> 4        4     NSAIDs
#>                                                                                        description
#> 1    A simplified cohort definition for new users of celecoxib, designed specifically for Eunomia.
#> 2    A simplified cohort definition for new users ofdiclofenac, designed specifically for Eunomia.
#> 3 A simplified cohort definition for gastrointestinal bleeding, designed specifically for Eunomia.
#> 4       A simplified cohort definition for new users of NSAIDs, designed specifically for Eunomia.
#>   count
#> 1  1844
#> 2   850
#> 3   479
#> 4  2694

covariateData <- FeatureExtraction::getDbCovariateData(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "main",
  cohortDatabaseSchema = "main",
  cohortTable = "cohort",
  covariateSettings = temporalCovariateSettings,
  aggregated = TRUE
)
#> Connecting using SQLite driver
#> Currently in a tryCatch or withCallingHandlers block, so unable to add global calling handlers. ParallelLogger will not capture R messages, errors, and warnings, only explicit calls to ParallelLogger. (This message will not be shown again this R session)
#> Sending temp tables to server
#> Inserting data took 0.0144 secs
#> Inserting data took 0.0312 secs
#> Constructing features on server
#>   |                                                                              |                                                                      |   0%  |                                                                              |=====                                                                 |   8%  |                                                                              |===========                                                           |  15%  |                                                                              |================                                                      |  23%  |                                                                              |======================                                                |  31%  |                                                                              |===========================                                           |  38%  |                                                                              |================================                                      |  46%  |                                                                              |======================================                                |  54%  |                                                                              |===========================================                           |  62%  |                                                                              |================================================                      |  69%  |                                                                              |======================================================                |  77%  |                                                                              |===========================================================           |  85%  |                                                                              |=================================================================     |  92%  |                                                                              |======================================================================| 100%
#> Executing SQL took 0.102 secs
#> Fetching data from server
#> Warning: Low disk space in 'C:/Users/asena5/AppData/Local/Temp/1/Rtmp2fzzCg'. Only 9.5 GB left.
#> Use options(warnDiskSpaceThreshold = <n>) to set the number of bytes for this warning to trigger.
#> This warning will not be shown for this file location again during this R session.
#> Fetching data took 0.177 secs

covariateData$covariates
#> # Source:   table<covariates> [4 x 5]
#> # Database: sqlite 3.41.2 [C:\Users\asena5\AppData\Local\Temp\1\Rtmp2fzzCg\file4085a72dbc.sqlite]
#>   cohortDefinitionId covariateId timeId sumValue averageValue
#>                <dbl>       <dbl>  <dbl>    <dbl>        <dbl>
#> 1                  1  4283893204      1        4      0.00217
#> 2                  1  4283893204      2        4      0.00217
#> 3                  4  4283893204      1        4      0.00148
#> 4                  4  4283893204      2        4      0.00148

FeatureExtraction::computeStandardizedDifference(
  covariateData1 = covariateData,
  covariateData2 = covariateData,
  cohortId1 = 1,
  cohortId2 = 4
)
#> # A tibble: 4 × 8
#>   covariateId   mean1    sd1   mean2    sd2     sd stdDiff covariateName        
#>         <dbl>   <dbl>  <dbl>   <dbl>  <dbl>  <dbl>   <dbl> <chr>                
#> 1  4283893204 0.00217 0.0465 0.00148 0.0385 0.0427 -0.0160 condition_era group:…
#> 2  4283893204 0.00217 0.0465 0.00148 0.0385 0.0427 -0.0160 condition_era group:…
#> 3  4283893204 0.00217 0.0465 0.00148 0.0385 0.0427 -0.0160 condition_era group:…
#> 4  4283893204 0.00217 0.0465 0.00148 0.0385 0.0427 -0.0160 condition_era group:…

Created on 2024-02-16 with reprex v2.1.0

As mentioned in this issue the resulting output lacks the timeId that corresponds to the covariateData$timeRef so we'll need to add that to the output when performing the std. diff. w/ temporal covariates.

@gowthamrao
Copy link
Member Author

gowthamrao commented Feb 20, 2024

Hi @ginberg @anthonysena this issue seems to apply to create table 1 function also. if you continue with @anthonysena documentation with this function, it will cause an error

FeatureExtraction::createTable1(
  covariateData1 = covariateData,
  covariateData2 = covariateData,
  cohortId1 = 1,
  cohortId2 = 4
)

@anthonysena anthonysena linked a pull request Mar 1, 2024 that will close this issue
@ginberg ginberg modified the milestones: V3.5.0, v3.6.0 Apr 18, 2024
@ginberg ginberg modified the milestones: v3.6.0, v3.7.0 Jun 4, 2024
@anthonysena anthonysena modified the milestones: v3.6.0, v3.7.0 Jun 7, 2024
@ginberg ginberg modified the milestones: v3.7.0, v4.0.0 Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants