-
Notifications
You must be signed in to change notification settings - Fork 127
Consul: Update monitor/telemetry/telegraf for v1.21 and v1.22 #1504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
13dcce7
99f12d9
cb34ea8
4a95403
064c2ed
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -9,30 +9,15 @@ description: >- | |||||
|
|
||||||
| This page describes the process to set up Telegraf to monitor Consul datacenter telemetry. | ||||||
|
|
||||||
| ## Overview | ||||||
| Consul makes a range of metrics in various formats available so operators can measure the health and stability of a datacenter, and diagnose or predict potential issues. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
The big fix in English here is keeping the subject, the verb, and the object of the opening clause closer together. |
||||||
|
|
||||||
| Consul makes a range of metrics in various formats available so operators can | ||||||
| measure the health and stability of a datacenter, and diagnose or predict | ||||||
| potential issues. | ||||||
| In this example you are going to use the [telegraf_plugin][] in conjunction with the StatsD protocol supported by Consul. For the full list of metrics available with Consul, refer to the [telemetry documentation](/consul/docs/reference/agent/telemetry). | ||||||
krastin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| There are number of monitoring tools and options available, but for the purposes | ||||||
| of this tutorial you are going to use the [telegraf_plugin][] in conjunction with | ||||||
| the StatsD protocol supported by Consul. | ||||||
| ## Workflow | ||||||
|
|
||||||
| You can read the full list of metrics available with Consul in the | ||||||
| [telemetry documentation](/consul/docs/reference/agent/telemetry). | ||||||
|
|
||||||
| In this tutorial you will: | ||||||
|
|
||||||
| - Configure Telegraf to collect StatsD and host level metrics | ||||||
| - Configure Consul to send metrics to Telegraf | ||||||
| - Review an example of metrics visualization | ||||||
| - Understand important metrics to aggregate and alert on | ||||||
|
|
||||||
| ## Install Telegraf | ||||||
|
|
||||||
| The process for installing Telegraf depends on your operating system. We | ||||||
| recommend following the [official Telegraf installation documentation][telegraf-install]. | ||||||
| 1. [Configure Telegraf to collect StatsD and host level metrics](#configure-telegraf) | ||||||
| 1. [Configure Consul to send metrics to Telegraf](#configure-consul) | ||||||
| 1. [Review Consul metrics](#review-consul-metrics) | ||||||
|
|
||||||
| ## Configure Telegraf | ||||||
|
|
||||||
|
|
@@ -43,9 +28,8 @@ for this purpose. | |||||
|
|
||||||
| You are going to enable some of the most common input plugins to monitor CPU, | ||||||
| memory, disk I/O, networking, and process status, since these are useful for | ||||||
| debugging Consul datacenter issues. | ||||||
|
|
||||||
| The `telegraf.conf` file starts with global options: | ||||||
| debugging Consul datacenter issues. Here is an example `telegraf.conf` file that | ||||||
| you can use as a starting point: | ||||||
|
|
||||||
| <CodeBlockConfig filename="telegraf.conf"> | ||||||
|
|
||||||
|
|
@@ -54,35 +38,11 @@ The `telegraf.conf` file starts with global options: | |||||
| interval = "10s" | ||||||
| flush_interval = "10s" | ||||||
| omit_hostname = false | ||||||
| ``` | ||||||
|
|
||||||
| </CodeBlockConfig> | ||||||
|
|
||||||
| You set the default collection interval to 10 seconds and ask Telegraf to | ||||||
| include a `host` tag in each metric. | ||||||
|
|
||||||
| As mentioned above, Telegraf also allows you to set additional tags on the | ||||||
| metrics that pass through it. In this case, you are adding tags for the server | ||||||
| role and datacenter. You can then use these tags in Grafana to filter queries | ||||||
| (for example, to create a dashboard showing only servers with the | ||||||
| `consul-server` role, or only servers in the `us-east-1` datacenter). | ||||||
|
|
||||||
| <CodeBlockConfig filename="telegraf.conf"> | ||||||
|
|
||||||
| ```toml | ||||||
| [global_tags] | ||||||
| role = "consul-server" | ||||||
| datacenter = "us-east-1" | ||||||
| ``` | ||||||
|
|
||||||
| </CodeBlockConfig> | ||||||
|
|
||||||
| Next, set up a StatsD listener on UDP port 8125, with instructions to calculate | ||||||
| percentile metrics and to parse DogStatsD-compatible tags, when they're sent: | ||||||
|
|
||||||
| <CodeBlockConfig filename="telegraf.conf"> | ||||||
|
|
||||||
| ```toml | ||||||
| [[inputs.statsd]] | ||||||
| protocol = "udp" | ||||||
| service_address = ":8125" | ||||||
|
|
@@ -95,20 +55,7 @@ percentile metrics and to parse DogStatsD-compatible tags, when they're sent: | |||||
| parse_data_dog_tags = true | ||||||
| allowed_pending_messages = 10000 | ||||||
| percentile_limit = 1000 | ||||||
| ``` | ||||||
|
|
||||||
| </CodeBlockConfig> | ||||||
|
|
||||||
| The full reference to all the available StatsD-related options in Telegraf is | ||||||
| [here][telegraf-statsd-input]. | ||||||
|
|
||||||
| Now, you can configure inputs for things like CPU, memory, network I/O, and disk | ||||||
| I/O. Most of them don't require any configuration, but make sure the `interfaces` | ||||||
| list in `inputs.net` matches the interface names you get from `ifconfig`. | ||||||
|
|
||||||
| <CodeBlockConfig filename="telegraf.conf"> | ||||||
|
|
||||||
| ```toml | ||||||
| [[inputs.cpu]] | ||||||
| percpu = true | ||||||
| totalcpu = true | ||||||
|
|
@@ -145,43 +92,32 @@ list in `inputs.net` matches the interface names you get from `ifconfig`. | |||||
|
|
||||||
| [[inputs.system]] | ||||||
| # no configuration | ||||||
| ``` | ||||||
|
|
||||||
| </CodeBlockConfig> | ||||||
|
|
||||||
| Another useful plugin is the [procstat][telegraf-procstat-input] plugin, which | ||||||
| reports metrics for processes you select: | ||||||
|
|
||||||
| <CodeBlockConfig filename="telegraf.conf"> | ||||||
|
|
||||||
| ```toml | ||||||
| [[inputs.procstat]] | ||||||
| pattern = "(consul)" | ||||||
|
|
||||||
| [[inputs.consul]] | ||||||
| address = "localhost:8500" | ||||||
| scheme = "http" | ||||||
| ``` | ||||||
|
|
||||||
| </CodeBlockConfig> | ||||||
|
|
||||||
| Telegraf even includes a [plugin][telegraf-consul-input] that monitors the | ||||||
| health checks associated with the Consul agent, using Consul API to query the | ||||||
| data. | ||||||
| The `telegraf.conf` file starts with global options - you set the default collection interval to 10 seconds and ask Telegraf to include a `host` tag in each metric. | ||||||
krastin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| It's important to note: the plugin itself will not report the telemetry, Consul | ||||||
| will report those stats already using StatsD protocol. | ||||||
| Telegraf also allows you to set additional tags on the metrics that pass through it. In this case, you are adding tags for the server role `consul-server` and datacenter `us-east-1`. You can further use these tags in Grafana to filter queries. | ||||||
krastin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| <CodeBlockConfig filename="telegraf.conf"> | ||||||
| The next config section sets up a StatsD listener on UDP port 8125, with instructions to calculate percentile metrics and to parse DogStatsD-compatible tags. Consul will use this to report telemetry stats. The full reference to all the available StatsD-related options in Telegraf is [here][telegraf-statsd-input]. | ||||||
krastin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| ```toml | ||||||
| [[inputs.consul]] | ||||||
| address = "localhost:8500" | ||||||
| scheme = "http" | ||||||
| ``` | ||||||
| The next configuration sections are used to configure inputs for things like CPU, memory, network I/O, and disk I/O. It is important to make sure the `interfaces` list in `inputs.net` matches the system interface names. | ||||||
krastin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| </CodeBlockConfig> | ||||||
| Another useful input plugin is the [procstat][telegraf-procstat-input] plugin, which reports metrics for a process matching a given pattern. In this case, you are using it to monitor the Consul agent process itself. | ||||||
krastin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| Telegraf even includes a [plugin][telegraf-consul-input] that monitors the health checks associated with the Consul agent, using Consul API to query the data. | ||||||
krastin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Include one more sentence here about applying the configuration to your Telegraf instance. |
||||||
| ## Telegraf configuration for Consul | ||||||
| ## Configure Consul | ||||||
|
|
||||||
| Asking Consul to send telemetry to Telegraf is as simple as adding a `telemetry` | ||||||
| section to your agent configuration: | ||||||
| In order for Consul to send telemetry to Telegraf, add a `telemetry` section to your agent configuration with the the hostname and port of the StatsD daemon address: | ||||||
krastin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| <CodeTabs heading="Consul agent configuration"> | ||||||
|
|
||||||
|
|
@@ -203,31 +139,15 @@ telemetry { | |||||
|
|
||||||
| </CodeTabs> | ||||||
|
|
||||||
| You only need to specify two options. The `dogstatsd_addr` | ||||||
| specifies the hostname and port of the StatsD daemon. | ||||||
|
|
||||||
| Note that the configuration specifies DogStatsD format instead of plain StatsD, | ||||||
| which tells Consul to send [tags][tagging] with each metric. Tags can be used by | ||||||
| Grafana to filter data on your dashboards (for example, displaying only the data | ||||||
| for which `role=consul-server`). Telegraf is compatible with the DogStatsD | ||||||
| format and allows you to add your own tags too. | ||||||
|
|
||||||
| The second option tells Consul not to insert the hostname in the names of the | ||||||
| metrics it sends to StatsD, since the hostnames will be sent as tags. Without | ||||||
| this option, the single metric `consul.raft.apply` would become multiple | ||||||
| metrics: | ||||||
|
|
||||||
| ```plaintext hideClipboard | ||||||
| consul.server1.raft.apply | ||||||
| consul.server2.raft.apply | ||||||
| consul.server3.raft.apply | ||||||
| ``` | ||||||
|
|
||||||
| If you are using a different agent (e.g. Circonus, Statsite, or plain StatsD), | ||||||
| you may want to change this configuration, and you can find the configuration | ||||||
| reference [here][consul-telemetry-config]. | ||||||
| The second option instructs Consul not to insert the hostname in the names of the metrics it sends to StatsD, since the hostnames will be sent as tags. If you need to have the hostnames as a part of the metric names, set this to `false`. For example, if `disable_hostname` is set to `false`, `consul.raft.apply` would become `consul.<HOSTNAME>.raft.apply`. For further information, check out find the Consul telemetry configuration reference [here][consul-telemetry-config]. | ||||||
krastin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| ## Visualize Telegraf Consul metrics | ||||||
| ## Review Consul metrics | ||||||
|
|
||||||
| You can use a tool like [Grafana][] or [Chronograf][] to visualize metrics from | ||||||
| Telegraf. | ||||||
|
|
@@ -236,8 +156,6 @@ Here is an example Grafana dashboard: | |||||
|
|
||||||
|  | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Confirm this image is up to date? |
||||||
|
|
||||||
krastin marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| ## Metric aggregates and alerting from Telegraf | ||||||
|
|
||||||
| ### Memory usage | ||||||
krastin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| | Metric Name | Description | | ||||||
|
|
@@ -332,24 +250,18 @@ unavailable, as the kernel spends all its time waiting for I/O to complete. | |||||
|
|
||||||
| ## Next steps | ||||||
|
|
||||||
| In this tutorial, you learned how to set up Telegraf with Consul to collect | ||||||
| metrics, and considered your options for visualizing, aggregating, and alerting | ||||||
| on those metrics. To learn about other factors (in addition to monitoring) that | ||||||
| you should consider when running Consul in production, check the | ||||||
| [Production Checklist][prod-checklist]. | ||||||
| To read further about telemetry in Consul, check the [Consul Agent Telemetry](/consul/docs/monitor/telemetry/agent) and [Consul Dataplane Telemetry](/consul/docs/monitor/telemetry/dataplane) pages. | ||||||
krastin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| To learn more about Consul monitoring, alerting and logging, check out the [Consul Monitoring](/consul/docs/monitor) page. | ||||||
krastin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| [non_negative_difference]: https://docs.influxdata.com/influxdb/v1.5/query_language/functions/#non-negative-difference | ||||||
| [consul_faq_fds]: /consul/docs/troubleshoot/faq#q-does-consul-require-certain-user-process-resource-limits- | ||||||
| [telegraf_plugin]: https://github.com/influxdata/telegraf/tree/master/plugins/inputs/consul | ||||||
| [telegraf-install]: https://docs.influxdata.com/telegraf/v1.6/introduction/installation/ | ||||||
| [telegraf-consul-input]: https://github.com/influxdata/telegraf/tree/release-1.6/plugins/inputs/consul | ||||||
| [telegraf-statsd-input]: https://github.com/influxdata/telegraf/tree/release-1.6/plugins/inputs/statsd | ||||||
| [telegraf-procstat-input]: https://github.com/influxdata/telegraf/tree/release-1.6/plugins/inputs/procstat | ||||||
| [telegraf-input-plugins]: https://docs.influxdata.com/telegraf/v1.6/plugins/inputs/ | ||||||
| [tagging]: https://docs.datadoghq.com/getting_started/tagging/ | ||||||
| [consul-telemetry-config]: /consul/docs/reference/agent/configuration-file/telemetry | ||||||
| [consul-telemetry-ref]: /consul/docs/reference/agent/telemetry | ||||||
| [telegraf-input-plugins]: https://docs.influxdata.com/telegraf/v1.6/plugins/inputs/ | ||||||
| [grafana]: https://www.influxdata.com/partners/grafana/ | ||||||
| [chronograf]: https://www.influxdata.com/time-series-platform/chronograf/ | ||||||
| [prod-checklist]: /consul/tutorials/production-deploy/production-checklist | ||||||
Uh oh!
There was an error while loading. Please reload this page.