Skip to content

Commit

Permalink
Add Grafana dashboard and installation steps (#1620)
Browse files Browse the repository at this point in the history
Problem: As a user, I want to know how to easily install prometheus and grafana to visualize my NGF metrics.

Solution: Add basic installation steps for both prometheus and grafana, and provide a sample dashboard (based on the nginx-prometheus-exporter dashboard)
  • Loading branch information
sjberman authored Mar 8, 2024
1 parent 799ea76 commit 5216f00
Show file tree
Hide file tree
Showing 4 changed files with 902 additions and 45 deletions.
118 changes: 84 additions & 34 deletions site/content/how-to/monitoring/prometheus.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Prometheus Metrics"
description: "Learn how to monitor your NGINX Gateway Fabric effectively. This guide provides easy steps for configuring and understanding key performance metrics using Prometheus."
description: "This document describes how to monitor NGINX Gateway Fabric using Prometheus and Grafana. It explains installation and configuration, as well as what metrics are available."
weight: 100
toc: true
docs: "DOCS-1418"
Expand All @@ -11,16 +11,96 @@ docs: "DOCS-1418"
## Overview


NGINX Gateway Fabric metrics are displayed in [Prometheus](https://prometheus.io/) format, simplifying monitoring. You can track NGINX and controller-runtime metrics through a metrics server orchestrated by the controller-runtime package. These metrics are enabled by default and can be accessed on HTTP port `9113`.

NGINX Gateway Fabric metrics are displayed in [Prometheus](https://prometheus.io/) format. These metrics are served through a metrics server orchestrated by the controller-runtime package on HTTP port `9113`. When installed, Prometheus automatically scrapes this port and collects metrics. [Grafana](https://grafana.com/) can be used for rich visualization of these metrics.

{{<call-out "important" "Security note for metrics">}}
Metrics are served over HTTP by default. Enabling HTTPS will secure the metrics endpoint with a self-signed certificate. When using HTTPS, adjust the Prometheus Pod scrape settings by adding the `insecure_skip_verify` flag to handle the self-signed certificate. For further details, refer to the [Prometheus documentation](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config).
{{</call-out>}}

## Installing Prometheus and Grafana

{{< note >}}These installations are for demonstration purposes and have not been tuned for a production environment.{{< /note >}}

### Prometheus

```shell
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/prometheus -n monitoring --create-namespace --set server.global.scrape_interval=15s
```

Once running, you can access the Prometheus dashboard by using port-forwarding in the background:

```shell
kubectl port-forward -n monitoring svc/prometheus-server 9090:80 &
```

Visit [http://127.0.0.1:9090](http://127.0.0.1:9090) to view the dashboard.

### Grafana


```shell
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install grafana grafana/grafana -n monitoring --create-namespace
```

Once running, you can access the Grafana dashboard by using port-forwarding in the background:

```shell
kubectl port-forward -n monitoring svc/grafana 3000:80 &
```

Visit [http://127.0.0.1:3000](http://127.0.0.1:3000) to view the Grafana UI.

The username for login is `admin`. The password can be acquired by running:

```shell
kubectl get secret -n monitoring grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
```

#### Configuring Grafana

In the Grafana UI menu, go to `Connections` then `Data sources`. Add your Prometheus service (`http://prometheus-server.monitoring.svc`) as a data source.

Download the following sample dashboard and Import as a new Dashboard in the Grafana UI.

{{< download "grafana-dashboard.json" "ngf-grafana-dashboard.json" >}}

## Available metrics in NGINX Gateway Fabric

NGINX Gateway Fabric provides a variety of metrics for monitoring and analyzing performance. These metrics are categorized as follows:

### NGINX/NGINX Plus metrics

NGINX metrics cover specific NGINX operations such as the total number of accepted client connections. For a complete list of available NGINX/NGINX Plus metrics, refer to the [NGINX Prometheus Exporter developer docs](https://github.com/nginxinc/nginx-prometheus-exporter#exported-metrics).

These metrics use the `nginx_gateway_fabric` namespace and include the `class` label, indicating the NGINX Gateway class. For example, `nginx_gateway_fabric_connections_accepted{class="nginx"}`.

### NGINX Gateway Fabric metrics

Metrics specific to NGINX Gateway Fabric include:

- `nginx_reloads_total`: Counts successful NGINX reloads.
- `nginx_reload_errors_total`: Counts NGINX reload failures.
- `nginx_stale_config`: Indicates if NGINX Gateway Fabric couldn't update NGINX with the latest configuration, resulting in a stale version.
- `nginx_last_reload_milliseconds`: Time in milliseconds for NGINX reloads.
- `event_batch_processing_milliseconds`: Time in milliseconds to process batches of Kubernetes events.

All these metrics are under the `nginx_gateway_fabric` namespace and include a `class` label set to the Gateway class of NGINX Gateway Fabric. For example, `nginx_gateway_fabric_nginx_reloads_total{class="nginx"}`.

### Controller-runtime metrics

Provided by the [controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) library, these metrics include:

- General resource usage like CPU and memory.
- Go runtime metrics such as the number of Go routines, garbage collection duration, and Go version.
- Controller-specific metrics, including reconciliation errors per controller, length of the reconcile queue, and reconciliation latency.

## How to change the default metrics configuration

Configuring NGINX Gateway Fabric for monitoring is straightforward. You can change metric settings using Helm or Kubernetes manifests, depending on your setup.
You can configure monitoring metrics for NGINX Gateway Fabric using Helm or Manifests.

### Using Helm

Expand Down Expand Up @@ -85,33 +165,3 @@ For enhanced security with HTTPS:
prometheus.io/scheme: "https"
<...>
```

## Available metrics in NGINX Gateway Fabric

NGINX Gateway Fabric provides a variety of metrics to assist in monitoring and analyzing performance. These metrics are categorized as follows:

### NGINX/NGINX Plus metrics

NGINX metrics, essential for monitoring specific NGINX operations, include details like the total number of accepted client connections. For a complete list of available NGINX/NGINX Plus metrics, refer to the [NGINX Prometheus Exporter developer docs](https://github.com/nginxinc/nginx-prometheus-exporter#exported-metrics).

These metrics use the `nginx_gateway_fabric` namespace and include the `class` label, indicating the NGINX Gateway class. For example, `nginx_gateway_fabric_connections_accepted{class="nginx"}`.

### NGINX Gateway Fabric metrics

Metrics specific to the NGINX Gateway Fabric include:

- `nginx_reloads_total`: Counts successful NGINX reloads.
- `nginx_reload_errors_total`: Counts NGINX reload failures.
- `nginx_stale_config`: Indicates if NGINX Gateway Fabric couldn't update NGINX with the latest configuration, resulting in a stale version.
- `nginx_last_reload_milliseconds`: Time in milliseconds for NGINX reloads.
- `event_batch_processing_milliseconds`: Time in milliseconds to process batches of Kubernetes events.

All these metrics are under the `nginx_gateway_fabric` namespace and include a `class` label set to the Gateway class of NGINX Gateway Fabric. For example, `nginx_gateway_fabric_nginx_reloads_total{class="nginx"}`.

### Controller-runtime metrics

Provided by the [controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) library, these metrics cover a range of aspects:

- General resource usage like CPU and memory.
- Go runtime metrics such as the number of Go routines, garbage collection duration, and Go version.
- Controller-specific metrics, including reconciliation errors per controller, length of the reconcile queue, and reconciliation latency.
2 changes: 1 addition & 1 deletion site/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ module github.com/nginxinc/nginx-gateway-fabric/site

go 1.21

require github.com/nginxinc/nginx-hugo-theme v0.40.8 // indirect
require github.com/nginxinc/nginx-hugo-theme v0.41.0 // indirect
12 changes: 2 additions & 10 deletions site/go.sum
Original file line number Diff line number Diff line change
@@ -1,10 +1,2 @@
github.com/nginxinc/nginx-hugo-theme v0.35.0 h1:7XB2GMy6qeJgKEJy9wOS3SYKYpfvLW3/H+UHRPLM4FU=
github.com/nginxinc/nginx-hugo-theme v0.35.0/go.mod h1:DPNgSS5QYxkjH/BfH4uPDiTfODqWJ50NKZdorguom8M=
github.com/nginxinc/nginx-hugo-theme v0.39.0 h1:P1hOPpityVUOM5OyIpQZa1UJyuUunGSmz0oZh/GYSJM=
github.com/nginxinc/nginx-hugo-theme v0.39.0/go.mod h1:DPNgSS5QYxkjH/BfH4uPDiTfODqWJ50NKZdorguom8M=
github.com/nginxinc/nginx-hugo-theme v0.40.0 h1:YP0I0+bRKcJ5WEb1s/OWcnlcvNvIcKscagJkCzsa+Vs=
github.com/nginxinc/nginx-hugo-theme v0.40.0/go.mod h1:DPNgSS5QYxkjH/BfH4uPDiTfODqWJ50NKZdorguom8M=
github.com/nginxinc/nginx-hugo-theme v0.40.1 h1:1Q94uFYegNvjvwDV1py9VlYmh62AF1gh1oPGqjNmtis=
github.com/nginxinc/nginx-hugo-theme v0.40.1/go.mod h1:DPNgSS5QYxkjH/BfH4uPDiTfODqWJ50NKZdorguom8M=
github.com/nginxinc/nginx-hugo-theme v0.40.8 h1:VtoSAtf9k67tI2jzbLRo0oFBAMHZBUPRh/xV4MYullI=
github.com/nginxinc/nginx-hugo-theme v0.40.8/go.mod h1:DPNgSS5QYxkjH/BfH4uPDiTfODqWJ50NKZdorguom8M=
github.com/nginxinc/nginx-hugo-theme v0.41.0 h1:uB9jC0Qk9i2CG63gScHxVHAEz1zyGoAdtY0Lcpkg1lI=
github.com/nginxinc/nginx-hugo-theme v0.41.0/go.mod h1:DPNgSS5QYxkjH/BfH4uPDiTfODqWJ50NKZdorguom8M=
Loading

0 comments on commit 5216f00

Please sign in to comment.