How to properly scale out OPAL servers for data updates directly from Kafka #671

kreyyser · 2024-09-27T07:53:11Z

kreyyser
Sep 27, 2024

Hi all,

We are using OPAL distributed authorization with multiple custom datasource and it is working fine.
Next step we have is to add realtime data updates between datasource sync triggers on OPAL clients.
First time we tried pushing the updates via REST API calls but at some point (don't remember exactly but ~20-30 RPS) timeouts were around 60-70 seconds. Increasing the uvicorn num workers didn't help at all (current is 6).

So we tried another approach - pushing events directly to kafka. Performance become much better but it is not for long.
For example 5 replicas and 150 pods of the same client and consistent load to kafka (~100 RPS) makes OPAL server pods to crash with OOM after 22 minutes. OPAL server pods have 4 CPU and 4Gi resources limits.
It starts very good but at some point CPU usage is at the limit level and pod starts increasing RAM usage and at some point crashes.

opal-server-uat-75994f7cbd-4s2gz                        3811m        1392Mi
opal-server-uat-75994f7cbd-6l59k                        2923m        983Mi
opal-server-uat-75994f7cbd-6wm7s                        2519m        580Mi
opal-server-uat-75994f7cbd-gcknf                        3922m        2261Mi
opal-server-uat-75994f7cbd-pmcjx                        3982m        895Mi



opal-server-uat-75994f7cbd-4s2gz                        3496m        1811Mi
opal-server-uat-75994f7cbd-6l59k                        2933m        1036Mi
opal-server-uat-75994f7cbd-6wm7s                        2973m        613Mi
opal-server-uat-75994f7cbd-gcknf                        3907m        2620Mi
opal-server-uat-75994f7cbd-pmcjx                        3853m        1049Mi



opal-server-uat-75994f7cbd-4s2gz                        3584m        2025Mi
opal-server-uat-75994f7cbd-6l59k                        3018m        1510Mi
opal-server-uat-75994f7cbd-6wm7s                        2051m        609Mi
opal-server-uat-75994f7cbd-gcknf                        3722m        3205Mi
opal-server-uat-75994f7cbd-pmcjx                        3981m        1217Mi



opal-server-uat-75994f7cbd-4s2gz                        3093m        2131Mi
opal-server-uat-75994f7cbd-6l59k                        3098m        1667Mi
opal-server-uat-75994f7cbd-6wm7s                        2777m        627Mi
opal-server-uat-75994f7cbd-gcknf                        3658m        3555Mi
opal-server-uat-75994f7cbd-pmcjx                        3983m        1607Mi



opal-server-uat-75994f7cbd-4s2gz                        3303m        2256Mi
opal-server-uat-75994f7cbd-6l59k                        2826m        1647Mi
opal-server-uat-75994f7cbd-6wm7s                        3297m        732Mi
opal-server-uat-75994f7cbd-gcknf                        3532m        3868Mi
opal-server-uat-75994f7cbd-pmcjx                        3968m        1810Mi


opal-server-uat-75994f7cbd-4s2gz                        3503m        2454Mi
opal-server-uat-75994f7cbd-6l59k                        2899m        1757Mi
opal-server-uat-75994f7cbd-6wm7s                        3458m        842Mi
opal-server-uat-75994f7cbd-pmcjx                        3969m        1929Mi

As you can see last log doesn't have one OPAL server pod at all.

Scaling OPAL server seems not to make situation better - eventually OPAL server crashes.
The logs on OPAL clients are crystal clear. There is an error on OPAL servers but it seems not to affect data consistency on the clients.

2024-09-27 07:46:34.582 | 1713 | fastapi_websocket_pubsub.event_notifier |ERROR  | Failed to notify subscriber sub_id=c7c82d0fbabb4191b7290a48fa450417 with topic=ad-users-test
Traceback (most recent call last):
  File "/usr/local/bin/gunicorn", line 33, in <module>
    sys.exit(load_entry_point('gunicorn==22.0.0', 'console_scripts', 'gunicorn')())
    │   │    └ <function importlib_load_entry_point at 0x7fb0337aef80>
    │   └ <built-in function exit>
    └ <module 'sys' (built-in)>
  File "/usr/local/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 67, in run
    WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]", prog=prog).run()
    │                                                       └ None
    └ <class 'gunicorn.app.wsgiapp.WSGIApplication'>
  File "/usr/local/lib/python3.10/site-packages/gunicorn/app/base.py", line 236, in run
    super().run()
  File "/usr/local/lib/python3.10/site-packages/gunicorn/app/base.py", line 72, in run
    Arbiter(self).run()
    │       └ <gunicorn.app.wsgiapp.WSGIApplication object at 0x7fb0338ec2e0>
    └ <class 'gunicorn.arbiter.Arbiter'>
  File "/usr/local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 211, in run
    self.manage_workers()
    │    └ <function Arbiter.manage_workers at 0x7fb03315d900>
    └ <gunicorn.arbiter.Arbiter object at 0x7fb033817490>
  File "/usr/local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 571, in manage_workers
    self.spawn_workers()
    │    └ <function Arbiter.spawn_workers at 0x7fb03315da20>
    └ <gunicorn.arbiter.Arbiter object at 0x7fb033817490>
  File "/usr/local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 642, in spawn_workers
    self.spawn_worker()
    │    └ <function Arbiter.spawn_worker at 0x7fb03315d990>
    └ <gunicorn.arbiter.Arbiter object at 0x7fb033817490>
  File "/usr/local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 609, in spawn_worker
    worker.init_process()
    │      └ <function UvicornWorker.init_process at 0x7fb02f724b80>
    └ <uvicorn.workers.UvicornWorker object at 0x7fb0329f7790>
  File "/usr/local/lib/python3.10/site-packages/uvicorn/workers.py", line 75, in init_process
    super().init_process()
  File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/base.py", line 142, in init_process
    self.run()
    │    └ <function UvicornWorker.run at 0x7fb02f724dc0>
    └ <uvicorn.workers.UvicornWorker object at 0x7fb0329f7790>
  File "/usr/local/lib/python3.10/site-packages/uvicorn/workers.py", line 107, in run
    return asyncio.run(self._serve())
           │       │   │    └ <function UvicornWorker._serve at 0x7fb02f724d30>
           │       │   └ <uvicorn.workers.UvicornWorker object at 0x7fb0329f7790>
           │       └ <function run at 0x7fb032a943a0>
           └ <module 'asyncio' from '/usr/local/lib/python3.10/asyncio/__init__.py'>
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
           │    │                  └ <coroutine object UvicornWorker._serve at 0x7fb02d55ce40>
           │    └ <method 'run_until_complete' of 'uvloop.loop.Loop' objects>
           └ <uvloop.Loop running=True closed=False debug=False>
> File "/usr/local/lib/python3.10/site-packages/fastapi_websocket_pubsub/event_notifier.py", line 220, in callback_subscribers
    await self.trigger_callback(data, topic, subscriber_id, event)
          │    │                │     │      │              └ Subscription(id='3790ba6dbb584c99a987c82ec90c74d3', subscriber_id='c7c82d0fbabb4191b7290a48fa450417', topic='ad-users-test', ...
          │    │                │     │      └ 'c7c82d0fbabb4191b7290a48fa450417'
          │    │                │     └ 'ad-users-test'
          │    │                └ {'entries': [{'url': '', 'config': {}, 'topics': ['ad-users-test'], 'dst_path': '/ad-users-test/ad-users/user234', 'save_met...
          │    └ <function EventNotifier.trigger_callback at 0x7fb02f2a7520>
          └ <fastapi_websocket_pubsub.websocket_rpc_event_notifier.WebSocketRpcEventNotifier object at 0x7fb02d378610>
  File "/usr/local/lib/python3.10/site-packages/fastapi_websocket_pubsub/event_notifier.py", line 178, in trigger_callback
    await subscription.callback(subscription, data)
          │            │        │             └ {'entries': [{'url': '', 'config': {}, 'topics': ['ad-users-test'], 'dst_path': '/ad-users-test/ad-users/user234', 'save_met...
          │            │        └ Subscription(id='3790ba6dbb584c99a987c82ec90c74d3', subscriber_id='c7c82d0fbabb4191b7290a48fa450417', topic='ad-users-test', ...
          │            └ <function RpcEventServerMethods.subscribe.<locals>.callback at 0x7fb0254c7c70>
          └ Subscription(id='3790ba6dbb584c99a987c82ec90c74d3', subscriber_id='c7c82d0fbabb4191b7290a48fa450417', topic='ad-users-test', ...
  File "/usr/local/lib/python3.10/site-packages/fastapi_websocket_pubsub/rpc_event_methods.py", line 26, in callback
    await self.channel.other.notify(subscription=sub, data=data)
          │    │                                 │         └ {'entries': [{'url': '', 'config': {}, 'topics': ['ad-users-test'], 'dst_path': '/ad-users-test/ad-users/user234', 'save_met...
          │    │                                 └ Subscription(id='3790ba6dbb584c99a987c82ec90c74d3', subscriber_id='c7c82d0fbabb4191b7290a48fa450417', topic='ad-users-test', ...
          │    └ <property object at 0x7fb02f3ac720>
          └ <fastapi_websocket_pubsub.rpc_event_methods.RpcEventServerMethods object at 0x7fb025b22200>
  File "/usr/local/lib/python3.10/site-packages/fastapi_websocket_rpc/rpc_channel.py", line 441, in call
    promise = await self.async_call(name, args)
                    │    │          │     └ {'subscription': Subscription(id='3790ba6dbb584c99a987c82ec90c74d3', subscriber_id='c7c82d0fbabb4191b7290a48fa450417', topic=...
                    │    │          └ 'notify'
                    │    └ <function RpcChannel.async_call at 0x7fb02f25a290>
                    └ <fastapi_websocket_rpc.rpc_channel.RpcChannel object at 0x7fb025b23a60>
  File "/usr/local/lib/python3.10/site-packages/fastapi_websocket_rpc/rpc_channel.py", line 433, in async_call
    await self.send(msg)
          │    │    └ RpcMessage(request=RpcRequest(method='notify', arguments={'subscription': Subscription(id='3790ba6dbb584c99a987c82ec90c74d3',...
          │    └ <function RpcChannel.send at 0x7fb02f259750>
          └ <fastapi_websocket_rpc.rpc_channel.RpcChannel object at 0x7fb025b23a60>
  File "/usr/local/lib/python3.10/site-packages/fastapi_websocket_rpc/rpc_channel.py", line 211, in send
    await self.socket.send(data)
          │    │      │    └ RpcMessage(request=RpcRequest(method='notify', arguments={'subscription': Subscription(id='3790ba6dbb584c99a987c82ec90c74d3',...
          │    │      └ <function JsonSerializingWebSocket.send at 0x7fb02f25a9e0>
          │    └ <fastapi_websocket_rpc.simplewebsocket.JsonSerializingWebSocket object at 0x7fb025b20b20>
          └ <fastapi_websocket_rpc.rpc_channel.RpcChannel object at 0x7fb025b23a60>
  File "/usr/local/lib/python3.10/site-packages/fastapi_websocket_rpc/simplewebsocket.py", line 44, in send
    await self._websocket.send(self._serialize(msg))
          │    │          │    │    │          └ RpcMessage(request=RpcRequest(method='notify', arguments={'subscription': Subscription(id='3790ba6dbb584c99a987c82ec90c74d3',...
          │    │          │    │    └ <function JsonSerializingWebSocket._serialize at 0x7fb02f25a8c0>
          │    │          │    └ <fastapi_websocket_rpc.simplewebsocket.JsonSerializingWebSocket object at 0x7fb025b20b20>
          │    │          └ <property object at 0x7fb02f297dd0>
          │    └ <fastapi_websocket_rpc.websocket_rpc_endpoint.WebSocketSimplifier object at 0x7fb025b23190>
          └ <fastapi_websocket_rpc.simplewebsocket.JsonSerializingWebSocket object at 0x7fb025b20b20>
  File "/usr/local/lib/python3.10/site-packages/starlette/websockets.py", line 165, in send_text
    await self.send({"type": "websocket.send", "text": data})
          │    │                                       └ '{"request": {"method": "notify", "arguments": {"subscription": {"id": "3790ba6dbb584c99a987c82ec90c74d3", "subscriber_id": "...
          │    └ <function WebSocket.send at 0x7fb02f32f7f0>
          └ <starlette.websockets.WebSocket object at 0x7fb025b23f70>
  File "/usr/local/lib/python3.10/site-packages/starlette/websockets.py", line 97, in send
    raise RuntimeError('Cannot call "send" once a close message has been sent.')
RuntimeError: Cannot call "send" once a close message has been sent.

We were playing with events batch sizes - OPAL server can consume large batches less often much better than small batches more frequently. Current batch is 10 events with one field update (PATCH type). Eventually it will be a PUT type but that is not the problem of OPAL server.

Can anyone help us understand how to properly scale OPAL sever for our needs?!
Is there only one reader from kafka at the time and many followers of OPAL server?
Maybe we can tweak something with kafka connection or something.
Maybe someone has successful case of scaling it for similar needs?
Any thoughts are welcomed!

Thanks in advance!

kreyyser · 2024-09-27T14:14:35Z

kreyyser
Sep 27, 2024
Author

Update:
with 35 RPS of kafka events with 50 instructions in the event seems like OPAL server is stable. No OOMs happened.
The problem now is that OPAL client after we stop the load keeps processing events for 20-30 more minutes which means that we have a huge delay in data consistency on OPAL clients. This is something we wanted to improve also. Any performance improvements suggestions are welcomed.

0 replies

kreyyser · 2024-09-27T14:18:17Z

kreyyser
Sep 27, 2024
Author

Here is the clients resources consumption (we have 150 of them for the test). It is pretty stable but appears too slow. No OOMs or errors on clients. Feels like opal client to opa communication is a bottleneck.

test-external-data-source-dev-78849c76cb-28mc9   1204m        1773Mi
test-external-data-source-dev-78849c76cb-2gxdx   1215m        1635Mi
test-external-data-source-dev-78849c76cb-4p9xr   1204m        1769Mi
test-external-data-source-dev-78849c76cb-4ps8g   1210m        1635Mi
test-external-data-source-dev-78849c76cb-4qgpf   1211m        1578Mi
test-external-data-source-dev-78849c76cb-5ntbj   1214m        1729Mi

0 replies

asafc · 2024-09-29T17:10:26Z

asafc
Sep 29, 2024

Hi @kreyyser,

Could you provide example data updates (redacted of course) that you are sending?
How does a "batch" look like? is it simply sending one DataUpdate event with many entries?
Is the data updates inline (are you providing data inside the entries) or are you redirecting to a url to download the event?
if so - what is the RPS of your http server that serves these updates?

cc @roekatz the bottleneck in the client might have something to do with the take-a-turn queue. Would love your insight here.

1 reply

kreyyser Sep 30, 2024
Author

Hi @asafc sure thing!

How does a "batch" look like? is it simply sending one DataUpdate event with many entries?

Yes - batch is is an event with multiple entries.
Example (added only one entry not to overwhelm with the info):

{
  "notifier_id": "56f6b1647ce311ef85baacde48001122",
  "topics": [
    "ad-users-test"
  ],
  "data": {
    "entries": [
      {
        "url": "",
        "config": {
          
        },
        "topics": [
          "ad-users-test"
        ],
        "dst_path": "/ad-users-test/ad-users/user123",
        "save_method": "PATCH",
        "data": [
          {
            "op": "replace",
            "path": "/creationDate",
            "value": "2024-09-27 18:15:33.020293 +0300 EEST m=+0.928941827"
          }
        ]
      }
    ],
    "reason": "users batch [824640607848] was sent to the system",
    "callback": {
      "callbacks": [
        
      ]
    }
  }
}

Is the data updates inline (are you providing data inside the entries) or are you redirecting to a url to download the event?

Updates are inline with data inside. Very small - just for a test. We try to simulate the easiest way for us to send events.

what is the RPS of your http server that serves these updates?

we send events to kafka topic of the opal server with the following pace:

350 records sent, 38.8 records/sec (0.10 MiB/sec egress), 171.3 ms avg latency, 54.5 ms stddev, 165.0 ms 50th, 241.0 ms 75th, 241.0 ms 95th, 241.0 ms 99th, 241.0 ms 99.9th, 0 total req. in flight
700 records sent, 36.8 records/sec (0.13 MiB/sec egress), 117.6 ms avg latency, 78.3 ms stddev, 108.0 ms 50th, 203.0 ms 75th, 241.0 ms 95th, 241.0 ms 99th, 241.0 ms 99.9th, 0 total req. in flight
1050 records sent, 36.2 records/sec (0.15 MiB/sec egress), 90.9 ms avg latency, 79.1 ms stddev, 43.0 ms 50th, 165.0 ms 75th, 241.0 ms 95th, 241.0 ms 99th, 241.0 ms 99.9th, 0 total req. in flight
1400 records sent, 35.9 records/sec (0.16 MiB/sec egress), 71.7 ms avg latency, 78.5 ms stddev, 42.0 ms 50th, 136.5 ms 75th, 241.0 ms 95th, 241.0 ms 99th, 241.0 ms 99.9th, 0 total req. in flight
1750 records sent, 35.7 records/sec (0.17 MiB/sec egress), 59.9 ms avg latency, 75.2 ms stddev, 31.0 ms 50th, 108.0 ms 75th, 241.0 ms 95th, 241.0 ms 99th, 241.0 ms 99.9th, 0 total req. in flight
2100 records sent, 35.6 records/sec (0.17 MiB/sec egress), 51.9 ms avg latency, 71.7 ms stddev, 10.0 ms 50th, 75.5 ms 75th, 241.0 ms 95th, 241.0 ms 99th, 241.0 ms 99.9th, 0 total req. in flight
2450 records sent, 35.5 records/sec (0.17 MiB/sec egress), 46.0 ms avg latency, 68.4 ms stddev, 10.0 ms 50th, 43.0 ms 75th, 241.0 ms 95th, 241.0 ms 99th, 241.0 ms 99.9th, 0 total req. in flight

where is record it is a single event with 50 entries. following configuration does not allow opal server to crash with OOM eventually.

We see 2 bottlenecks

OPAL server fetching events from kafka and sending to client. Slow send to clients causes having them all in memory and eventually OOM
OPAL client processing the events. Slow events sending to OPA causes getting all events in memory and eventually OOM

Let us know if you need any other information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to properly scale out OPAL servers for data updates directly from Kafka #671

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to properly scale out OPAL servers for data updates directly from Kafka #671

kreyyser Sep 27, 2024

Replies: 3 comments · 1 reply

kreyyser Sep 27, 2024 Author

kreyyser Sep 27, 2024 Author

asafc Sep 29, 2024

kreyyser Sep 30, 2024 Author

kreyyser
Sep 27, 2024

Replies: 3 comments 1 reply

kreyyser
Sep 27, 2024
Author

kreyyser
Sep 27, 2024
Author

asafc
Sep 29, 2024

kreyyser Sep 30, 2024
Author