Add Grafana dashboard and installation steps (#1620)

Problem: As a user, I want to know how to easily install prometheus and grafana to visualize my NGF metrics. Solution: Add basic installation steps for both prometheus and grafana, and provide a sample dashboard (based on the nginx-prometheus-exporter dashboard)
nginxinc · Mar 8, 2024 · 5216f00 · 5216f00
1 parent 799ea76
commit 5216f00
Show file tree

Hide file tree

Showing 4 changed files with 902 additions and 45 deletions.
diff --git a/site/content/how-to/monitoring/prometheus.md b/site/content/how-to/monitoring/prometheus.md
@@ -1,6 +1,6 @@
 ---
 title: "Prometheus Metrics"
-description: "Learn how to monitor your NGINX Gateway Fabric effectively. This guide provides easy steps for configuring and understanding key performance metrics using Prometheus."
+description: "This document describes how to monitor NGINX Gateway Fabric using Prometheus and Grafana. It explains installation and configuration, as well as what metrics are available."
 weight: 100
 toc: true
 docs: "DOCS-1418"
@@ -11,16 +11,96 @@ docs: "DOCS-1418"
 ## Overview
 
 
-NGINX Gateway Fabric metrics are displayed in [Prometheus](https://prometheus.io/) format, simplifying monitoring. You can track NGINX and controller-runtime metrics through a metrics server orchestrated by the controller-runtime package. These metrics are enabled by default and can be accessed on HTTP port `9113`.
-
+NGINX Gateway Fabric metrics are displayed in [Prometheus](https://prometheus.io/) format. These metrics are served through a metrics server orchestrated by the controller-runtime package on HTTP port `9113`. When installed, Prometheus automatically scrapes this port and collects metrics. [Grafana](https://grafana.com/) can be used for rich visualization of these metrics.
 
 {{<call-out "important" "Security note for metrics">}}
 Metrics are served over HTTP by default. Enabling HTTPS will secure the metrics endpoint with a self-signed certificate. When using HTTPS, adjust the Prometheus Pod scrape settings by adding the `insecure_skip_verify` flag to handle the self-signed certificate. For further details, refer to the [Prometheus documentation](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config).
 {{</call-out>}}
 
+## Installing Prometheus and Grafana
+
+{{< note >}}These installations are for demonstration purposes and have not been tuned for a production environment.{{< /note >}}
+
+### Prometheus
+
+```shell
+helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+helm repo update
+helm install prometheus prometheus-community/prometheus -n monitoring --create-namespace --set server.global.scrape_interval=15s
+```
+
+Once running, you can access the Prometheus dashboard by using port-forwarding in the background:
+
+```shell
+kubectl port-forward -n monitoring svc/prometheus-server 9090:80 &
+```
+
+Visit [http://127.0.0.1:9090](http://127.0.0.1:9090) to view the dashboard.
+
+### Grafana
+
+
+```shell
+helm repo add grafana https://grafana.github.io/helm-charts
+helm repo update
+helm install grafana grafana/grafana -n monitoring --create-namespace
+```
+
+Once running, you can access the Grafana dashboard by using port-forwarding in the background:
+
+```shell
+kubectl port-forward -n monitoring svc/grafana 3000:80 &
+```
+
+Visit [http://127.0.0.1:3000](http://127.0.0.1:3000) to view the Grafana UI.
+
+The username for login is `admin`. The password can be acquired by running:
+
+```shell
+kubectl get secret -n monitoring grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
+```
+
+#### Configuring Grafana
+
+In the Grafana UI menu, go to `Connections` then `Data sources`. Add your Prometheus service (`http://prometheus-server.monitoring.svc`) as a data source.
+
+Download the following sample dashboard and Import as a new Dashboard in the Grafana UI.
+
+{{< download "grafana-dashboard.json" "ngf-grafana-dashboard.json" >}}
+
+## Available metrics in NGINX Gateway Fabric
+
+NGINX Gateway Fabric provides a variety of metrics for monitoring and analyzing performance. These metrics are categorized as follows:
+
+### NGINX/NGINX Plus metrics
+
+NGINX metrics cover specific NGINX operations such as the total number of accepted client connections. For a complete list of available NGINX/NGINX Plus metrics, refer to the [NGINX Prometheus Exporter developer docs](https://github.com/nginxinc/nginx-prometheus-exporter#exported-metrics).
+
+These metrics use the `nginx_gateway_fabric` namespace and include the `class` label, indicating the NGINX Gateway class. For example, `nginx_gateway_fabric_connections_accepted{class="nginx"}`.
+
+### NGINX Gateway Fabric metrics
+
+Metrics specific to NGINX Gateway Fabric include:
+
+- `nginx_reloads_total`: Counts successful NGINX reloads.
+- `nginx_reload_errors_total`: Counts NGINX reload failures.
+- `nginx_stale_config`: Indicates if NGINX Gateway Fabric couldn't update NGINX with the latest configuration, resulting in a stale version.
+- `nginx_last_reload_milliseconds`: Time in milliseconds for NGINX reloads.
+- `event_batch_processing_milliseconds`: Time in milliseconds to process batches of Kubernetes events.
+
+All these metrics are under the `nginx_gateway_fabric` namespace and include a `class` label set to the Gateway class of NGINX Gateway Fabric. For example, `nginx_gateway_fabric_nginx_reloads_total{class="nginx"}`.
+
+### Controller-runtime metrics
+
+Provided by the [controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) library, these metrics include:
+
+- General resource usage like CPU and memory.
+- Go runtime metrics such as the number of Go routines, garbage collection duration, and Go version.
+- Controller-specific metrics, including reconciliation errors per controller, length of the reconcile queue, and reconciliation latency.
+
 ## How to change the default metrics configuration
 
-Configuring NGINX Gateway Fabric for monitoring is straightforward. You can change metric settings using Helm or Kubernetes manifests, depending on your setup.
+You can configure monitoring metrics for NGINX Gateway Fabric using Helm or Manifests.
 
 ### Using Helm
 
@@ -85,33 +165,3 @@ For enhanced security with HTTPS:
         prometheus.io/scheme: "https"
         <...>
     ```
-
-## Available metrics in NGINX Gateway Fabric
-
-NGINX Gateway Fabric provides a variety of metrics to assist in monitoring and analyzing performance. These metrics are categorized as follows:
-
-### NGINX/NGINX Plus metrics
-
-NGINX metrics, essential for monitoring specific NGINX operations, include details like the total number of accepted client connections. For a complete list of available NGINX/NGINX Plus metrics, refer to the [NGINX Prometheus Exporter developer docs](https://github.com/nginxinc/nginx-prometheus-exporter#exported-metrics).
-
-These metrics use  the `nginx_gateway_fabric` namespace and include the `class` label, indicating the NGINX Gateway class. For example, `nginx_gateway_fabric_connections_accepted{class="nginx"}`.
-
-### NGINX Gateway Fabric metrics
-
-Metrics specific to the NGINX Gateway Fabric include:
-
-- `nginx_reloads_total`: Counts successful NGINX reloads.
-- `nginx_reload_errors_total`: Counts NGINX reload failures.
-- `nginx_stale_config`: Indicates if NGINX Gateway Fabric couldn't update NGINX with the latest configuration, resulting in a stale version.
-- `nginx_last_reload_milliseconds`: Time in milliseconds for NGINX reloads.
-- `event_batch_processing_milliseconds`: Time in milliseconds to process batches of Kubernetes events.
-
-All these metrics are under the `nginx_gateway_fabric` namespace and include a `class` label set to the Gateway class of NGINX Gateway Fabric. For example, `nginx_gateway_fabric_nginx_reloads_total{class="nginx"}`.
-
-### Controller-runtime metrics
-
-Provided by the [controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) library, these metrics cover a range of aspects:
-
-- General resource usage like CPU and memory.
-- Go runtime metrics such as the number of Go routines, garbage collection duration, and Go version.
-- Controller-specific metrics, including reconciliation errors per controller, length of the reconcile queue, and reconciliation latency.
diff --git a/site/go.mod b/site/go.mod
@@ -2,4 +2,4 @@ module github.com/nginxinc/nginx-gateway-fabric/site
 
 go 1.21
 
-require github.com/nginxinc/nginx-hugo-theme v0.40.8 // indirect
+require github.com/nginxinc/nginx-hugo-theme v0.41.0 // indirect
diff --git a/site/go.sum b/site/go.sum
@@ -1,10 +1,2 @@
-github.com/nginxinc/nginx-hugo-theme v0.35.0 h1:7XB2GMy6qeJgKEJy9wOS3SYKYpfvLW3/H+UHRPLM4FU=
-github.com/nginxinc/nginx-hugo-theme v0.35.0/go.mod h1:DPNgSS5QYxkjH/BfH4uPDiTfODqWJ50NKZdorguom8M=
-github.com/nginxinc/nginx-hugo-theme v0.39.0 h1:P1hOPpityVUOM5OyIpQZa1UJyuUunGSmz0oZh/GYSJM=
-github.com/nginxinc/nginx-hugo-theme v0.39.0/go.mod h1:DPNgSS5QYxkjH/BfH4uPDiTfODqWJ50NKZdorguom8M=
-github.com/nginxinc/nginx-hugo-theme v0.40.0 h1:YP0I0+bRKcJ5WEb1s/OWcnlcvNvIcKscagJkCzsa+Vs=
-github.com/nginxinc/nginx-hugo-theme v0.40.0/go.mod h1:DPNgSS5QYxkjH/BfH4uPDiTfODqWJ50NKZdorguom8M=
-github.com/nginxinc/nginx-hugo-theme v0.40.1 h1:1Q94uFYegNvjvwDV1py9VlYmh62AF1gh1oPGqjNmtis=
-github.com/nginxinc/nginx-hugo-theme v0.40.1/go.mod h1:DPNgSS5QYxkjH/BfH4uPDiTfODqWJ50NKZdorguom8M=
-github.com/nginxinc/nginx-hugo-theme v0.40.8 h1:VtoSAtf9k67tI2jzbLRo0oFBAMHZBUPRh/xV4MYullI=
-github.com/nginxinc/nginx-hugo-theme v0.40.8/go.mod h1:DPNgSS5QYxkjH/BfH4uPDiTfODqWJ50NKZdorguom8M=
+github.com/nginxinc/nginx-hugo-theme v0.41.0 h1:uB9jC0Qk9i2CG63gScHxVHAEz1zyGoAdtY0Lcpkg1lI=
+github.com/nginxinc/nginx-hugo-theme v0.41.0/go.mod h1:DPNgSS5QYxkjH/BfH4uPDiTfODqWJ50NKZdorguom8M=