Replies: 3 comments 1 reply
-
Update: |
Beta Was this translation helpful? Give feedback.
-
Here is the clients resources consumption (we have 150 of them for the test). It is pretty stable but appears too slow. No OOMs or errors on clients. Feels like opal client to opa communication is a bottleneck.
|
Beta Was this translation helpful? Give feedback.
-
Hi @kreyyser, Could you provide example data updates (redacted of course) that you are sending? cc @roekatz the bottleneck in the client might have something to do with the take-a-turn queue. Would love your insight here. |
Beta Was this translation helpful? Give feedback.
-
Hi all,
We are using OPAL distributed authorization with multiple custom datasource and it is working fine.
Next step we have is to add realtime data updates between datasource sync triggers on OPAL clients.
First time we tried pushing the updates via REST API calls but at some point (don't remember exactly but ~20-30 RPS) timeouts were around 60-70 seconds. Increasing the uvicorn num workers didn't help at all (current is 6).
So we tried another approach - pushing events directly to kafka. Performance become much better but it is not for long.
For example 5 replicas and 150 pods of the same client and consistent load to kafka (~100 RPS) makes OPAL server pods to crash with OOM after 22 minutes. OPAL server pods have 4 CPU and 4Gi resources limits.
It starts very good but at some point CPU usage is at the limit level and pod starts increasing RAM usage and at some point crashes.
As you can see last log doesn't have one OPAL server pod at all.
Scaling OPAL server seems not to make situation better - eventually OPAL server crashes.
The logs on OPAL clients are crystal clear. There is an error on OPAL servers but it seems not to affect data consistency on the clients.
We were playing with events batch sizes - OPAL server can consume large batches less often much better than small batches more frequently. Current batch is 10 events with one field update (PATCH type). Eventually it will be a PUT type but that is not the problem of OPAL server.
Can anyone help us understand how to properly scale OPAL sever for our needs?!
Is there only one reader from kafka at the time and many followers of OPAL server?
Maybe we can tweak something with kafka connection or something.
Maybe someone has successful case of scaling it for similar needs?
Any thoughts are welcomed!
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions