-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance - Serialization and Validation #418
Comments
Replace with |
@sidharthramesh |
cc: @vidi42 |
@stefanspiska thank you for the quick reply. With just that 1 change, it's already much faster:
= Averaging around For context, We're building a Nifi Processor that can ingest compositions in bulk after multiple other ETL pipelines. |
I do not thing what you are trying to do is a good idea. If you need a batch which run in one transaction you can do that via the contribution Endpoint. (Now supported in the sdk). And finally if you do not want to have the rest overhead some other protocols could be added via a plugin, but plugins is a beta feature right now. |
Hey @stefanspiska, I understand my solution is hacky, and yes, I totally expect the database schema to change over time. The points you made about concurrency and integrity are also bothering me now, and it's probably best to seek a proper solution to this - will come in handy for many clients. 2 key requirements to be able to do ETL well - idempotency and batching.
We tried using the EHRbase REST API first, but didn't meet these requirements:
|
Configuration information
Steps to reproduce
I'm trying to directly load compositions into a Postgres database using the SDK.
The data is in the Simplified Flat format, and this needs to be validated and converted into the Database Native format.
The input data is a JSON array of multiple compositions (batches of 1000) that look like this:
A snippet from the script that does the conversion looks like:
The
put_composition
is a stored procedure on Postgres that will do what's necessary to create a composition, contribution, party and entries in the database.This takes about <
30ms
/ compositionActual result
Validation and transforming 3013 compositions in total took a total of - 661.1 seconds running on an M1 Macbook Air (running Java without Rosetta emulation).
The batches of
x
compositions each took:Averaging at
219ms
per composition.Expected result (Acceptance Criteria)
Running validation and transformation operations should at least be as fast as the database insert operation ~
30ms
to not make the validation and transformation process the choking point on ETL pipelines.Any other suggestions or workarounds to speed up the process would also be much appreciated!
Definition of Done
The text was updated successfully, but these errors were encountered: