Is it possible to know which patients had records produced by which modules? #1246

fabkury · 2023-02-04T14:17:30Z

fabkury
Feb 4, 2023

My understanding is that Synthea runs each patient through all modules, and that most (maybe all) modules are probabilistic, which means each patient has some chance to "pass unscathed" by a module. That is, not ever have the condition of the module.

Is it possible to obtain the complete list of all patients who had records produced by each module?

For example, if module X gives each patient a 2% chance of being diagnosed with a given cancer, is it possible to obtain the list of those 2% somewhere? Without checking for the presence of the diagnoses codes/lab tests/synthetic patient data itself. I'm looking for some sort of external, authoritative, "ground truth" about the synthetic patients.

Thanks!

Answered by fabkury

Feb 7, 2023

You're right, lhs-open.

I was just able to export each patient's module history (using option exporter.json.include_module_history). I see that it gives you the list of states the patient went through, but not what records in the output were produced from that state (which is what I had asked for).

So I guess the correct answer to my question is: unfortunately no :/

This would be a great feature to have for purposes of learning electronic patient phenotyping. The module history is like the "answer key" to the exercise. The exercise is to identify patients with a given condition by querying the available patient data.

I might be able to make do with just the list of states from each module…

View full answer

fabkury · 2023-02-04T14:22:07Z

fabkury
Feb 4, 2023
Author

I guess my question could also be phrased as:

Is it possible to establish (with certainty) from which state, of which module, came each row of data in the CSVs that Sythea produces?

Thanks!

0 replies

fabkury · 2023-02-04T16:25:26Z

fabkury
Feb 4, 2023
Author

I realized the JSON export page (https://github.com/synthetichealth/synthea/wiki/JSON-Export) describes an option to export each patient's path through each module. Just what I was looking for.

0 replies

lhs-open · 2023-02-04T18:50:43Z

lhs-open
Feb 4, 2023

The Json export record probably does have the module information directly. How do you tell which record or which patient is a result of which module? That's your original question, right?

0 replies

fabkury · 2023-02-07T17:31:11Z

fabkury
Feb 7, 2023
Author

You're right, lhs-open.

I was just able to export each patient's module history (using option exporter.json.include_module_history). I see that it gives you the list of states the patient went through, but not what records in the output were produced from that state (which is what I had asked for).

So I guess the correct answer to my question is: unfortunately no :/

This would be a great feature to have for purposes of learning electronic patient phenotyping. The module history is like the "answer key" to the exercise. The exercise is to identify patients with a given condition by querying the available patient data.

I might be able to make do with just the list of states from each module, but here's my +1 for this feature to get implemented at some point: provide an option that lets you know, for each record in the output, exactly which state of which module produced it.

Thanks for all.

6 replies

fabkury Aug 9, 2024
Author

I'm not sure if I understand all you explained, but thanks a lot for your detailed thoughts! I think it would be great to have the detailed history, it would enable many use cases of the data.

annayudovin Aug 9, 2024

If this is something you are still wrestling with, I think I can create an example to illustrate what I'm talking about - it's really not as complicated as it sounds (and if you'd rather do this via email, try ayudovin at gmail). No worries if you've already moved on to something else :)

ofajardo Aug 13, 2024

hi @annayudovin I am interested on looking at an example, and I am sure it would be useful for others as well. Would you mind to share?

fabkury Aug 15, 2024
Author

@annayudovin now that I re-read I understood what you explained. Thanks for writing it out.

What you said requires rebuilding the dataset, but can work in some use cases for sure.

My use case is more nuanced because I am looking at the Synthea data through an ETL'd version of it. My Synthea data is in OMOP CDM format. So, to go with your strategy, I would need to figure out how to make the custom identifiers/hacks appear on the other side of the ETL (and not harm the ETL).

While I appreciate @annayudovin 's idea very much, I maintain my vote for this to be implemented in Synthea: full ability to trace exactly which state of which module generated each record in the Synthea output.

annayudovin Aug 20, 2024

@fabkury: I certainly agree with you, the ability to trace exactly which state of which module generated each record in the Synthea output would be extremely useful (and more widely applicable than my hack)

In the meantime, however, if @ofajardo (or anyone else) is interested in how to implement said hack, I'm attaching the module I modified along with a step-by-step description of what I did to put it to use.
cystic_fibrosis.json

I opened the stock version of the Cystic Fibrosis module in Synthea Module Builder at https://synthetichealth.github.io/module-builder and modified the first ConditionOnset state by selecting the "Display" in the "Codes" section, and appending string "####" at the end.

I then downloaded the modified module and copied it into ~\synthea\src\main\resources\modules directory, overwriting the previous version of cystic_fibrosis.json.

In the .json file the change looks like this:
"codes": [
{
"system": "SNOMED-CT",
"code": 190905008,
"display": "Cystic Fibrosis ####"
}

I replaced the synthea.properties file (in ~\synthea\bin\main\ directory) with the "stock" version (downloaded from github), because the one I've been using was heavily altered, rolled back the other customizations to my build and rebuilt the executables, then ran Synthea with the following string:

.\run_synthea.bat -p 1000 -s 999 -cs 999 Massachusetts Boston
(meaning generate 1000 patients in Boston, using 999 as seed and 999 as clinician seed)

I then searched in ~\synthea\output directory for the added string "####", which showed up in one file - not very surprising, since cystic fibrosis is a rare disease.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to know which patients had records produced by which modules? #1246

{{title}}

Replies: 4 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Is it possible to know which patients had records produced by which modules? #1246

fabkury Feb 4, 2023

Replies: 4 comments · 6 replies

fabkury Feb 4, 2023 Author

fabkury Feb 4, 2023 Author

lhs-open Feb 4, 2023

fabkury Feb 7, 2023 Author

fabkury Aug 9, 2024 Author

annayudovin Aug 9, 2024

ofajardo Aug 13, 2024

fabkury Aug 15, 2024 Author

annayudovin Aug 20, 2024

fabkury
Feb 4, 2023

Replies: 4 comments 6 replies

fabkury
Feb 4, 2023
Author

fabkury
Feb 4, 2023
Author

lhs-open
Feb 4, 2023

fabkury
Feb 7, 2023
Author

fabkury Aug 9, 2024
Author

fabkury Aug 15, 2024
Author