Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requesting February Release Cycle Priority Update from ARAs and KPs: Jaeger/OpenTel (telemetry) - Provide evidence that your tools can reliably support 20 sequential queries with a maximum of 10 seconds pause between each query (our current goal for performance in the short term) #7

Open
9 of 13 tasks
tursynay opened this issue Jan 12, 2024 · 12 comments

Comments

@tursynay
Copy link
Collaborator

tursynay commented Jan 12, 2024

Hello TACT representatives for ARAs and KPs. This priority item for the current release cycle requires an update from each ARA and KP on the status of this work. The goal for Jaeger/OpenTel (telemetry) - Provide evidence that your tools can reliably support 20 sequential queries with a maximum of 10 seconds pause between each query (our current goal for performance in the short term) was this:
The goal is to support 20 queries. Jaeger is part of the process. KPs need to support 20 one-hops whenever a query is run. This is a minimum goal to show that your tool can handle a small amount of load. Future goals will need to be more component specific.

Individuals representing ARAs and KPs on TACT, please add a comment to this ticket with your status update on the goal of this task, check the box off here in the issue (if you can and remember). We would greatly appreciate your response by the end of Wednesday January 17, 2024

ARAs

KPs

Thank you.

@gglusman
Copy link

Multiomics KPs are served by Service Provider.

@karafecho
Copy link

Confirmed. Exposure Provider's KPs are served by Automat.

Note that the CQS should probably be added to your list of ARAs, with Jason R. assigned.

@CaseyTa
Copy link

CaseyTa commented Jan 12, 2024

Confirmed for Clinical Data Provider

@kaiwenho
Copy link

Confirmed for Unsecret Agent.

We didn’t use Jaeger since we aren't making TRAPI calls to KPs, but we have verified that our ARA can handle 20 consecutive queries on both CI and TEST environments.

Please refer to the attached files for detailed timings as evidence of this performance.
CI-timing-1-14-2023.txt
TEST-timing-1-15-2023.txt

@webyrd
Copy link

webyrd commented Jan 15, 2024

As @kaiwenho says, we have confirmed that Unsecret can handle 20 consecutive queries, even though we aren't using Jaeger (since we aren't making external TRAPI calls).

@GregHydeDartmouth
Copy link

Our CHP API currently runs with an average runtime of 308ms per query. We've tested 20 sequential queries and complete in ~31 seconds. We've implemented auto instrumentation using Open Telemetry and uWSGI. However, we are facing issues with traces and metrics not exporting to the console, and collector support has not been tested yet. We're seeking assistance to understand why our implementation isn't aligning with the Open Telemetry documentation.

@YaphetKG
Copy link

YaphetKG commented Jan 17, 2024

Our CHP API currently runs with an average runtime of 308ms per query. We've tested 20 sequential queries and complete in ~31 seconds. We've implemented auto instrumentation using Open Telemetry and uWSGI. However, we are facing issues with traces and metrics not exporting to the console, and collector support has not been tested yet. We're seeking assistance to understand why our implementation isn't aligning with the Open Telemetry documentation.

Hi @GregHydeDartmouth , are you using other exporters besides the Console exporter? we have been using the Jaeger exporter, if it helps we can discuss on the slack jaeger channel.

@bill-baumgartner
Copy link

Here are updates for the TMKP services:

The DocumentMetadata API has fully implemented OpenTelemetry and is sending traces to the SRI-run Jaeger collector in ITRB. The DocumentMetadata API does not accept TRAPI queries, but does meet the initial service level objective of 90% of requests within 150ms. This would also more than satisfy the requirement of 20 sequential requests if it applied.

The Targeted Assertion API is hosted by the Service Provider.

The Literature Cooccurrence API has OpenTelemetry implemented in it, but the format is not accepted by the current version of the ITRB collector. Per discussions, this will be addressed in a future version of the collector at which point it will be able to receive telemetry from the Literature Cooccurrence API. This API has been tested with 24 simultaneous queries and is able to complete them, more than satisfying the requirement of 20 sequential queries.

@tokebe
Copy link

tokebe commented Jan 18, 2024

Confirmed for Exploring Agent/Service Provider. We didn't test via Jaeger, instead, here is the output of this performance test script:

Results (click to expand)
Beginning test of 20 simultanoues queries...
MONDO:0008170: Finished in 16s
MONDO:0005155: Finished in 10s
MONDO:0004975: Finished in 9s
HP:0003124: Finished in 5s
MONDO:0005148: Finished in 16s
HP:0001993: Finished in 4s
HP:0011015: Finished in 16s
MONDO:0005015: Finished in 37s
HP:0000822: Finished in 14s
HP:0000717: Finished in 5s
MONDO:0005260: Finished in 5s
HP:0003003: Finished in 14s
HP:0002315: Finished in 6s
MONDO:0007035: Finished in 3s
MONDO:0002251: Finished in 21s
MONDO:0005812: Finished in 4s
MONDO:0005002: Finished in 14s
MONDO:0005789: Finished in 4s
HP:0002014: Finished in 12s
MONDO:0005147: Finished in 7s

Test completed in 37s
Score: 100%
Test passed!

This is a test of 20 simultaneous one-hops to BTE CI, which should satisfy the 20 sequential queries requirement.

Update: Below is test results of 20 simultaneous Creative Mode Treats queries, which all respond within the 5-minute time limit:

Results (click to expand)
Beginning test of 20 simultanoues queries...
HP:0002014: Finished in 30s
HP:0003124: Finished in 98s
MONDO:0005002: Finished in 15s
HP:0000717: Finished in 115s
MONDO:0005812: Finished in 218s
MONDO:0005789: Finished in 133s
MONDO:0008170: Finished in 17s
HP:0001993: Finished in 206s
HP:0011015: Finished in 20s
MONDO:0005155: Finished in 13s
MONDO:0005015: Finished in 50s
MONDO:0002251: Finished in 37s
HP:0002315: Finished in 13s
MONDO:0005148: Finished in 27s
MONDO:0005260: Finished in 43s
HP:0003003: Finished in 20s
MONDO:0004975: Finished in 104s
MONDO:0007035: Finished in 265s
HP:0000822: Finished in 19s
MONDO:0005147: Finished in 9s

Test completed in 265s
Score: 100%
Test passed!

@vdancik
Copy link

vdancik commented Jan 18, 2024

Confirmed that MolePro can handle support 20 sequential queries with a maximum of 10 seconds pause; however we were not able to make Jaeger work.

@brettasmi
Copy link

I use locust for load testing, which showed that our systems (improving-agent) were able to handle 50 sequential queries with a max 10 second pause.

By the way in the original message, improving agent is tagged with QuiPrimusAbOris, but I don't know who that is.

@tursynay
Copy link
Collaborator Author

who that is.
@brettasmi, QuiPrimusAbOris is Sui Huang

Thank you for the update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests