Best practice for webserver liveness probe check

### Description

👋 Dear Airflow community,

Recently we ran some stress tests on Airflow’s asset-based scheduling and noticed that the webserver was frequently restarting due to liveness probe failures. The liveness probe we were using was:
```
/api/v2/monitor/health
```

This was based on the guidance from the old health endpoint response:
https://github.com/apache/airflow/blob/31f0eac1e15fee842d451d56c603d9005c30ddcb/airflow-core/src/airflow/api_fastapi/core_api/app.py#L85

From reading the source code, my understanding is that `/api/v2/monitor/health` checks the overall health of the metadatabase, scheduler, and triggerer. If there’s any slowdown in retrieving health information from these components, the webserver gets restarted, which makes the UI unavailable. Ideally, we’d like the UI to remain available even if the metadb or scheduler is under heavy load.

What would be the recommended alternative liveness check that doesn’t make the webserver’s health dependent on backend components? I see some options, such as the execution API health endpoint:
https://github.com/apache/airflow/blob/31f0eac1e15fee842d451d56c603d9005c30ddcb/airflow-core/src/airflow/api_fastapi/execution_api/routes/health.py#L30

I also noticed that the official chart for the API server uses the version endpoint:
https://github.com/apache/airflow/blob/31f0eac1e15fee842d451d56c603d9005c30ddcb/chart/templates/api-server/api-server-deployment.yaml#L194

Any suggestions or guidance would be much appreciated 🙏

### Use case/motivation

A liveness probe check API end point for webserver that is not dependent on other components

### Related issues

_No response_

### Are you willing to submit a PR?

- [x] Yes I am willing to submit a PR!

### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Best practice for webserver liveness probe check #54850

Description

Use case/motivation

Related issues

Are you willing to submit a PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Best practice for webserver liveness probe check #54850

Description

Description

Use case/motivation

Related issues

Are you willing to submit a PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions