-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenFGA returns unexpected errors randomly #44
Comments
@dlirai could you share the exact model, tuples, and requests that you are making that can reproduce this issue? A reproducible example is a good first step for us to troubleshoot. Also, what version of OpenFGA are you running? Are you just using the defaults from the Helm chart? |
Yes, I am using the defaults from the Helm chart. |
@dlirai hi! Could you retry your test with the latest release and let me know if it improves things? https://github.com/openfga/helm-charts/releases/tag/openfga-0.1.23 Also, when you test with more than 1 replica of OpenFGA, please note this: https://openfga.dev/docs/getting-started/running-in-production#database-recommendations
|
@dlirai - did you manage to retry? Did you encounter the same issue? |
We are running into this as well lately. We aren't close to any redlines on the DB (CPU, memory, connections) or in the ECS container (CPU, memory). |
I am using OpenFGA's HTTP API to perform authorization checks. I encounter a weird issue: some authorization check(s) may "randomly" receive error responses, instead of "true" or "false". This could happen in all of the following three scenarios:
The error responses are quite random. Sometimes, I get 1; sometimes I get 3; sometimes, I don't get any error response!! The error message is as follows:
Also, it seems that having replicaCount=1 makes the issue almost impossible to happen. Having replicaCount=3 (the default value) or replicaCount=5 makes the issue more likely to happen.
The OpenFGA server is launched as follows, using the official Helm chart:
In general, in Scenario 2, the issue is more likely to happen compared to Scenario 1. Even with replicaCount=1, it is still very possible for the issue to happen. Usually, 3 or 5 out of 5000 authorization checks may receive error responses.
The issue is even more likely to happen compared to Scenario 1 and Scenario 2. About 100 or even more checks will receive error responses, compared to just a couple in Scenario 1 or 2.
Note that I was using the same model and data for the testings in all scenarios. The authorization checks that receive error responses are different in different runs. Thus, I don't think it's the issue with my model or data.
Besides, when I use an unofficial OpenFGA Helm chart here:
https://github.com/AlexandreBrg/openfga-helm
to do testings in Scenario 2 and Scenario 3, I never have the same issue. I.e., the unofficial OpenFGA Helm chart works correctly all the time!
Could someone help look into this issue?
The text was updated successfully, but these errors were encountered: