elastic · yaauie · Oct 7, 2024 · Oct 8, 2024 · Oct 8, 2024 · mashhurs
diff --git a/docs/static/monitoring/monitoring-apis.asciidoc b/docs/static/monitoring/monitoring-apis.asciidoc
@@ -2,13 +2,13 @@
 [[monitoring]]
 == APIs for monitoring {ls}
 
-{ls} provides monitoring APIs for retrieving runtime metrics
-about {ls}:
+{ls} provides monitoring APIs for retrieving runtime information about {ls}:
 
 * <<node-info-api>>
 * <<plugins-api>>
 * <<node-stats-api>>
 * <<hot-threads-api>>
+* <<logstash-health-report-api>>
 
 
 You can use the root resource to retrieve general information about the Logstash instance, including
@@ -1184,3 +1184,155 @@ Example of a human-readable response:
 	 org.jruby.internal.runtime.NativeThread.join(NativeThread.java:75)
 
 --------------------------------------------------
+
+
+[[logstash-health-report-api]]
+=== Health report API
+
+An API that reports the health status of Logstash.
+
+[source,js]
+--------------------------------------------------
+curl -XGET 'localhost:9600/_health_report?pretty'
+--------------------------------------------------
+
+==== Description
+
+The health API returns a report with the health status of Logstash and the pipelines that are running inside of it.
+The report contains a list of indicators that compose Logstash functionality.
+
+Each indicator has a health status of: `green`, `unknown`, `yellow`, or `red`.
+The indicator will provide an explanation and metadata describing the reason for its current health status.
+
+The top-level status is controlled by the worst indicator status.
+
+In the event that an indicator's status is non-green, a list of impacts may be present in the indicator result which detail the functionalities that are negatively affected by the health issue.
+Each impact carries with it a severity level, an area of the system that is affected, and a simple description of the impact on the system.
+
+Some health indicators can determine the root cause of a health problem and prescribe a set of steps that can be performed in order to improve the health of the system.
+The root cause and remediation steps are encapsulated in a `diagnosis`.
+A diagnosis contains a cause detailing a root cause analysis, an action containing a brief description of the steps to take to fix the problem, and the URL for detailed troubleshooting help.
+
+NOTE: The health indicators perform root cause analysis of non-green health statuses.
+      This can be computationally expensive when called frequently.
+
+==== Response body
+
+`status`::
+(Optional, string) Health status of {ls}, based on the aggregated status of all indicators. Statuses are:
+
+`green`:::
+{ls} is healthy.
+
+`unknown`:::
+The health of {ls} could not be determined.
+
+`yellow`:::
+The functionality of {ls} is in a degraded state and may need remediation to avoid the health becoming `red`.
+
+`red`:::
+{ls} is experiencing an outage or certain features are unavailable for use.
+
+`indicators`::
+(object) Information about the health of the {ls} indicators.
+
++
+.Properties of `indicators`
+[%collapsible%open]
+====
+`<indicator>`::
+(object) Contains health results for an indicator.
++
+.Properties of `<indicator>`
+[%collapsible%open]
+=======
+`status`::
+(string) Health status of the indicator. Statuses are:
+
+`green`:::
+The indicator is healthy.
+
+`unknown`:::
+The health of the indicator could not be determined.
+
+`yellow`:::
+The functionality of an indicator is in a degraded state and may need remediation to avoid the health becoming `red`.
+
+`red`:::
+The indicator is experiencing an outage or certain features are unavailable for use.
+
+`symptom`::
+(string) A message providing information about the current health status.
+
+`details`::
+(Optional, object) An object that contains additional information about the indicator that has lead to the current health status result.
+Each indicator has <<logstash-health-api-response-details, a unique set of details>>.
+
+`impacts`::
+(Optional, array) If a non-healthy status is returned, indicators may include a list of impacts that this health status will have on {ls}.
++
+.Properties of `impacts`
+[%collapsible%open]
+========
+`severity`::
+(integer) How important this impact is to the functionality of {ls}.
+A value of 1 is the highest severity, with larger values indicating lower severity.
+
+`description`::
+(string) A description of the impact on {ls}.
+
+`impact_areas`::
+(array of strings) The areas {ls} functionality that this impact affects.
+Possible values are:
++
+--
+* `pipeline_execution`
+--
+
+========
+
+`diagnosis`::
+(Optional, array) If a non-healthy status is returned, indicators may include a list of diagnosis that encapsulate the cause of the health issue and an action to take in order to remediate the problem.
-(Optional, array) If a non-healthy status is returned, indicators may include a list of diagnosis that encapsulate the cause of the health issue and an action to take in order to remediate the problem.
+(Optional, array) If a non-healthy status is returned, indicators may include a list of diagnoses that encapsulate the cause of the health issue and an action to take in order to remediate the problem.
-(Optional, array) If a non-healthy status is returned, indicators may include a list of diagnosis that encapsulate the cause of the health issue and an action to take in order to remediate the problem.
+(Optional, array) If a non-healthy status is returned, indicators may include a list of diagnoses that encapsulate the cause of the health issue and an action to take in order to remediate the problem.
++
+.Properties of `diagnosis`
+[%collapsible%open]
+========
+`cause`::
+(string) A description of a root cause of this health problem.
+
+`action`::
+(string) A brief description the steps that should be taken to remediate the problem.
+A more detailed step-by-step guide to remediate the problem is provided by the `help_url` field.
+
+`help_url`::
+(string) A link to the troubleshooting guide that'll fix the health problem.
+========
+=======
+====
+
+[role="child_attributes"]
+[[logstash-health-api-response-details]]
+==== Indicator Details
+
+Each health indicator in the health API returns a set of details that further explains the state of the system.
+The details have contents and a structure that is unique to each indicator.
+
+[[logstash-health-api-response-details-pipeline]]
+===== Pipeline Indicator Details
+
+`pipelines/indicators/<pipeline_id>/details`::
+(object) Information about the specified pipeline.
++
+.Properties of `pipelines/indicators/<pipeline_id>/details`
+[%collapsible%open]
+====
+`status`::
+(object) Details related to the pipeline's current status and run-state.
++
+.Properties of `status`
+[%collapsible%open]
+========
+`state`::
+(string) The current state of the pipeline, including whether it is `loading`, `running`, `finished`, or `terminated`.
+========
+====
diff --git a/docs/static/troubleshoot/health-pipeline-status.asciidoc b/docs/static/troubleshoot/health-pipeline-status.asciidoc
@@ -0,0 +1,37 @@
+[[health-report-pipeline-status]]
+=== Health Report Pipeline Status
+
+The Pipeline indicator has a `status` probe that is capable of producing one of several diagnoses about the pipeline's lifecycle, indicating whether the pipeline is currently running.
+
+[[health-report-pipeline-status-diagnosis-loading]]
+==== [[loading]]Loading Pipeline
+
+A pipeline that is loading is not yet processing data, and is considered a temporarily-degraded pipeline state.
+Some plugins perform actions or pre-validation that can delay the starting of the pipeline, such as when a plugin pre-establishes a connection to an external service before allowing the pipeline to start.
+When these plugins take significant time to start up, the whole pipeline can remain in a loading state for an extended time.
+
+If your pipeline does not come up in a reasonable amount of time, consider checking the Logstash logs to see if the plugin shows evidence of being caught in a retry loop.
+
+[[health-report-pipeline-status-diagnosis-finished]]
+==== [[finished]]Finished Pipeline
+
+A logstash pipeline whose input plugins have all completed will be shut down once events have finished processing.
+
+Many plugins can be configured to run indefinitely, either by listening for new inbound events or by polling for events on a schedule.
+A finished pipeline will not produce or process any more events until it is restarted, which will occur if the pipeline's definition is changed and pipeline reloads are enabled.
+If you wish to keep your pipeline runing, consider configuring its input to run on a schedule or otherwise listen for new events.
+
+[[health-report-pipeline-status-diagnosis-terminated]]
+==== [[terminated]]Terminated Pipeline
+
+When a Logstash pipeline's filter or output plugins crash, the entire pipeline is terminated and intervention is required.
+
+A terminated pipeline will not produce or process any more events until it is restarted, which will occur if the pipeline's definition is changed and pipeline reloads are enabled.
+Check the logs to determine the cause of the crash, and report the issue to the plugin maintainers.
+
+[[health-report-pipeline-status-diagnosis-unknown]]
+==== [[unknown]]Unknown Pipeline
+
+When a Logstash pipeline either cannot be created or has recently been deleted the health report doesn't know enough to produce a meaningful status.
+
+Check the logs to determine if the pipeline crashed during creation, and report the issue to the plugin maintainers.
diff --git a/docs/static/troubleshoot/troubleshooting.asciidoc b/docs/static/troubleshoot/troubleshooting.asciidoc
@@ -28,3 +28,4 @@ include::ts-logstash.asciidoc[]
 include::ts-plugins-general.asciidoc[]
 include::ts-plugins.asciidoc[]
 include::ts-other-issues.asciidoc[]
+include::health-pipeline-status.asciidoc[]