Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify health check to run in the Resource Health BB #106

Open
Tracked by #113
j08lue opened this issue Oct 15, 2024 · 5 comments
Open
Tracked by #113

Identify health check to run in the Resource Health BB #106

j08lue opened this issue Oct 15, 2024 · 5 comments
Assignees
Labels
Milestone

Comments

@j08lue
Copy link
Collaborator

j08lue commented Oct 15, 2024

The Resource Health BB is establishing patterns for fetching health checks and trace data (OpenTelemetry) from building blocks.

How far are we from being able to provide any of these via APIs or so?

@j08lue
Copy link
Collaborator Author

j08lue commented Oct 15, 2024

@j08lue
Copy link
Collaborator Author

j08lue commented Nov 26, 2024

Kubernetes already runs a couple of health checks every 15 seconds and throws an error on 3 consecutive fails:

https://github.com/developmentseed/eoapi-k8s/blob/817ec9df82815f04d25faae9ca86fed985695e1b/helm-chart/eoapi/templates/services/deployment.yaml#L38-L49

The health checks it makes are /healthz etc routes on our FastAPI runtimes, which are basic pings (i.e. no test whether a db connection exists or so).

That means we could leverage the aggregated metrics in Kubernetes as an overall eoAPI health check, e.g. checking that a minimum number of pods are alive, possibly via the Grafana API on the support service, currently at https://eoapisupport.develop.eoepca.org/.

@j08lue j08lue added this to the Q3 milestone Nov 26, 2024
@j08lue j08lue assigned j08lue and unassigned j08lue Nov 26, 2024
@j08lue j08lue added the DevSeed label Nov 28, 2024
@j08lue j08lue self-assigned this Nov 28, 2024
@j08lue
Copy link
Collaborator Author

j08lue commented Dec 11, 2024

We received a more detailed request from @dovydas-an:

we (the developers of Resource Health BB) are reaching out in order to obtain necessary information to be able to set up exemplary health checks.

At the moment, we are considering a simple health check that would "ping" your BB, receive a response and depending on the response produce a health check outcome. In order to be able to setup such a check, we would need to know:

  1. URL of the endpoint
  2. Expected response code that would correspond to "OK" health check outcome (e.g. 200)
  3. If authentication is needed, authentication credentials and how they should be added to the request (header)

@j08lue
Copy link
Collaborator Author

j08lue commented Dec 11, 2024

@dovydas-an, the Data Access BB consists of several services, each of which should have a health endpoint like you requested.

  1. Raster API: https://eoapi.develop.eoepca.org/raster/healthz, 200, no auth
  2. Vector API: https://eoapi.develop.eoepca.org/vector/healthz, 200, no auth
  3. STAC API: TBD, maybe this would need to be added to our EOEPCA runtime, @pantierra?
  4. Coverages API: TBD @jankovicgd

As I understand it, all of these endpoints are already being probed on a regular basis by Kubernetes. Maybe there is a way to ask Kubernetes for a status, instead of also pinging these endpoints directly? Perhaps @ividito can advice?

If we can rely on Kubernetes for liveliness checks and the building blocks internally do their due diligence to make sure their services are up, I think that would be preferable, no?

@pantierra
Copy link
Collaborator

STAC API has one here: https://eoapi.develop.eoepca.org/stac/_mgmt/ping, 200, no auth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants