Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add /health handler to node server #2462

Merged
merged 1 commit into from
Mar 5, 2025

Conversation

tillrohrmann
Copy link
Contributor

This commit adds a very simple /health handler that returns 200 if the server is running. This allows to check whether the node server is running.

This fixes #2461.

@AhmedSoliman
Copy link
Contributor

The failing test is a little concerning. Is this something that @pcholakov should look at?

@AhmedSoliman
Copy link
Contributor

I feel that I'm missing a little context on why this is needed, and if grpc's health check endpoint can be a reasonable replacement or not.

@tillrohrmann
Copy link
Contributor Author

The failing test case is captured here #2428.

@pcholakov
Copy link
Contributor

I'm on it; it's very odd - the only way I can explain what we're seeing is that occasionally the object_store file:// protocol provider silently fails to either write or read back the data we've recently stored. I haven't been able to reproduce this yet.

@tillrohrmann
Copy link
Contributor Author

The motivation for this PR was that we were using in our tests (verification, e2e, local cluster) the ingress and/or admin http /health endpoint to await that the binary is up and running. With the provision step, this is not always possible anymore because we only start those components after the provisioning. That's why I used as a band aid the /metrics endpoint to await the start of the node server in those tests.

I can look into the grpc health endpoint if you think that this would be more suitable.

@tillrohrmann
Copy link
Contributor Author

I've updated the PR to print the IdentResponse as Json when calling /health.

Copy link
Contributor

@jackkleeman jackkleeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

This commit adds a very simple /health handler that returns 200 and
the node identification information if the server is running. This
allows to check whether the node server is running and what's the status
of its components.

This fixes restatedev#2461.
@tillrohrmann tillrohrmann merged commit 7390ac6 into restatedev:main Mar 5, 2025
5 checks passed
@tillrohrmann tillrohrmann deleted the issues/2461 branch March 5, 2025 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add /health endpoint to node server
4 participants