Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenFGA returns unexpected errors randomly #44

Open
dlirai opened this issue Jun 21, 2023 · 5 comments
Open

OpenFGA returns unexpected errors randomly #44

dlirai opened this issue Jun 21, 2023 · 5 comments

Comments

@dlirai
Copy link

dlirai commented Jun 21, 2023

I am using OpenFGA's HTTP API to perform authorization checks. I encounter a weird issue: some authorization check(s) may "randomly" receive error responses, instead of "true" or "false". This could happen in all of the following three scenarios:

  • Scenario 1: Use OpenFGA along with the integrated Postgres database that is launched with the official Helm chart. The OpenFGA server along with the integrated Postgres database server is launched as follows:
helm install openfga openfga/openfga \
--set datastore.engine=postgres \
--set datastore.uri="postgres://postgres:[email protected]:5432/postgres?sslmode=disable" \
--set postgres.enabled=true \
--set postgresql.auth.postgresPassword=password \
--set postgresql.auth.database=postgres

The error responses are quite random. Sometimes, I get 1; sometimes I get 3; sometimes, I don't get any error response!! The error message is as follows:

{
  "code": "deadline_exceeded",
  "message": "context deadline exceeded"
}

Also, it seems that having replicaCount=1 makes the issue almost impossible to happen. Having replicaCount=3 (the default value) or replicaCount=5 makes the issue more likely to happen.

  • Scenario 2: Use OpenFGA with an independent Postgres database. The independent Postgres database server is launched as follows:
helm install dlpostgres \
    	--set auth.postgresPassword=password \
    	oci://registry-1.docker.io/bitnamicharts/postgresql

The OpenFGA server is launched as follows, using the official Helm chart:

helm install openfga openfga/openfga \
  		--set replicaCount=1 \
		--set log.level=error \
  		--set datastore.engine=postgres \
  		--set datastore.uri="postgres://postgres:[email protected]:5432/postgres?sslmode=disable"

In general, in Scenario 2, the issue is more likely to happen compared to Scenario 1. Even with replicaCount=1, it is still very possible for the issue to happen. Usually, 3 or 5 out of 5000 authorization checks may receive error responses.

  • Scenario 3: Use OpenFGA along with Azure Postgres database server. I created the Azure Postgres database server first, and then launch the OpenFGA server using the official Helm chart as follows:
helm install openfga openfga/openfga \
  		--set replicaCount=1 \
		--set log.level=error \
  		--set datastore.engine=postgres \
  		--set datastore.uri="CONNECTION_STRING_FROM_AZURE_POSTGRES_DATABASE_SERVER"

The issue is even more likely to happen compared to Scenario 1 and Scenario 2. About 100 or even more checks will receive error responses, compared to just a couple in Scenario 1 or 2.

Note that I was using the same model and data for the testings in all scenarios. The authorization checks that receive error responses are different in different runs. Thus, I don't think it's the issue with my model or data.

Besides, when I use an unofficial OpenFGA Helm chart here:
https://github.com/AlexandreBrg/openfga-helm
to do testings in Scenario 2 and Scenario 3, I never have the same issue. I.e., the unofficial OpenFGA Helm chart works correctly all the time!
Could someone help look into this issue?

@jon-whit
Copy link
Collaborator

jon-whit commented Jun 21, 2023

@dlirai could you share the exact model, tuples, and requests that you are making that can reproduce this issue? A reproducible example is a good first step for us to troubleshoot.

Also, what version of OpenFGA are you running? Are you just using the defaults from the Helm chart?

@dlirai
Copy link
Author

dlirai commented Jun 21, 2023

Yes, I am using the defaults from the Helm chart.

@dlirai dlirai changed the title OpenFGA returns unexpected "null" responses randomly OpenFGA returns unexpected errors randomly Jun 21, 2023
@miparnisari
Copy link
Member

@dlirai hi! Could you retry your test with the latest release and let me know if it improves things? https://github.com/openfga/helm-charts/releases/tag/openfga-0.1.23

Also, when you test with more than 1 replica of OpenFGA, please note this: https://openfga.dev/docs/getting-started/running-in-production#database-recommendations

The server setting OPENFGA_DATASTORE_MAX_OPEN_CONNS should be set to be equal to your database's max connections. For example, in Postgres, you can see this value via running the SQL query SHOW max_connections;. If you are running multiple instances of the OpenFGA server, you should divide this setting equally among the instances. For example, if your database's max_connections is 100, and you have 2 OpenFGA instances, OPENFGA_DATASTORE_MAX_OPEN_CONNS should be set to 50 for each instance.

@rhamzeh
Copy link
Member

rhamzeh commented Jan 22, 2024

@dlirai - did you manage to retry? Did you encounter the same issue?

@mdnorman
Copy link

We are running into this as well lately. We aren't close to any redlines on the DB (CPU, memory, connections) or in the ECS container (CPU, memory).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants