-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
export createConceptCountsTable() #1067
Comments
Sorry for the lateness in reply I missed this issue when you first raised it. This approach seems reasonable, but I wonder if the solution would be better included inside CohortGenerator as an extra step following cohort creation? It seems like this solution might be generally more useful than just for cohort diagnostics - tagging @anthonysena for visibility. However, I'm happy to look at adopting this change into the main code. Just submit a PR and I will gladly test it out. |
No worries and thanks for the response. I'll send you the PR as soon as possible. I would argue that the function createConceptCountsTable() is already part of CohortDiagnostics' workflow. We also used the variable useExternalConceptCountsTable inside CD, so it was more natural for us to develop this, avoiding any major modification. |
Did anything ever come out of this? If not, I'd be willing to help out. I'm curious why |
This was a fork that I never got a PR for. I expect that this could be useful, especially in contexts where the Achilles counts are there. If you're interested in following through I will gladly work to get the PR included. |
Hi @azimov I sent the PR for exporting the function to useExternalConceptCountsTable. |
Good idea @katy-sadowski! @cebarboza and I started working on the idea to speed up CohortDiagnstics by using the pre-computed concept counts in Achilles. So instead of building the concepts counts table CohortDiagnostics would have the option to use the data that is already available in Achilles. Here is a SQL query that would pull the data for
I ran this on IPCI and the CohortDiagnostics sql script on IPCI and compared the results. df is created by the sql script in cohort diagnostics and df2 is coming from the sql pasted above. Note there is a huge difference in the number of rows. This is likely because a lot of data in IPCI falls outside the observation period. Achilles does not count concepts outside observation period. Cohort diagnostics does include concepts outside observation period. Another difference is the inclusion of concept id 0. this concept id occurs in multiple cdm tables and so is repeated in the concept table created by cohort diagnostics. I removed these rows. In the table created by cohort diagnostics there is no way to tell which domain the rows with concept id 0 came from. @azimov, @katy-sadowski what do you think about adding achilles as an option for getting concept counts in cohort diagnostics? Also what do you think about making the counts consistent by excluding concepts outside observation period and removing concept id 0? |
I created a pr for use of the achilles tables. Not tested yet but wanted to get some feedback on the approach first. #1120 |
Achilles has an optional table that has concept counts in a format that we could possibly use out of the box without any reformmating or copying. The table is called https://github.com/OHDSI/Achilles/blob/ea478a34179f52c13bb798abd41d7ad96d1b1fc2/R/Achilles.R#L145 I'm not sure yet if this is exactly what we want because I would need to better understand the sql script and the analyses that it is pulling from but it seems like we could possibly use this table as is for concept counts. |
At Darwin EU we are trying to run CohortDiagnostics in a more efficient way when running it for multiple studies. We believe it would be a good enhancement to export the function
createConceptCountsTable()
to generate theconcept_counts
before executing the diagnostics.By doing this, we can perform this calculation just once for a specific
vocabulary_version
, instead of repeating this process for each study.In our fork darwin-eu-dev/CohortDiagnostics, the user can
createConceptCountsTable()
. This table is saved in thecohortDatabaseSchema
.Then we use the parameter
useExternalConceptCountsTable
inexecuteDiagnostics()
. If TRUE,executeDiagnostics()
uses theconcept_counts
created previously in thecohortDatabaseSchema
. The user should specify the name of the external concept counts table, generallyconcept_counts
We also modified the
CreateConceptCountTable.sql
file, to add a new column with the vocabulary_version.https://github.com/darwin-eu-dev/CohortDiagnostics/blob/ca6d9074bb097b9ce60b7bc6bf72e68a84f650fe/inst/sql/sql_server/CreateConceptCountTable.sql#L102C1-L106C2
Then, there are checks in place that evaluate if the vocabulary_version in the concept_counts table is equal to the version of the database the user is running the diagnostics.
https://github.com/darwin-eu-dev/CohortDiagnostics/blob/ca6d9074bb097b9ce60b7bc6bf72e68a84f650fe/R/RunDiagnostics.R#L679C1-L708C4
There's also a vignette to explain how to run this functions
UseExternalConceptTable.Rmd
. We have been testing this approach but we wanted to discuss this before sending a pull request.The text was updated successfully, but these errors were encountered: