Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 26 additions & 114 deletions content/consul/v1.21.x/content/docs/monitor/telemetry/telegraf.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,30 +9,15 @@ description: >-

This page describes the process to set up Telegraf to monitor Consul datacenter telemetry.

## Overview
Consul makes a range of metrics in various formats available so operators can measure the health and stability of a datacenter, and diagnose or predict potential issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Consul makes a range of metrics in various formats available so operators can measure the health and stability of a datacenter, and diagnose or predict potential issues.
Consul makes metrics available in a range of formats so that operators can measure the health and stability of a datacenter, as well as diagnose or predict potential issues.

The big fix in English here is keeping the subject, the verb, and the object of the opening clause closer together.


Consul makes a range of metrics in various formats available so operators can
measure the health and stability of a datacenter, and diagnose or predict
potential issues.
In this example you are going to use the [telegraf_plugin][] in conjunction with the StatsD protocol supported by Consul. For the full list of metrics available with Consul, refer to the [telemetry documentation](/consul/docs/reference/agent/telemetry).

There are number of monitoring tools and options available, but for the purposes
of this tutorial you are going to use the [telegraf_plugin][] in conjunction with
the StatsD protocol supported by Consul.
## Workflow

You can read the full list of metrics available with Consul in the
[telemetry documentation](/consul/docs/reference/agent/telemetry).

In this tutorial you will:

- Configure Telegraf to collect StatsD and host level metrics
- Configure Consul to send metrics to Telegraf
- Review an example of metrics visualization
- Understand important metrics to aggregate and alert on

## Install Telegraf

The process for installing Telegraf depends on your operating system. We
recommend following the [official Telegraf installation documentation][telegraf-install].
1. [Configure Telegraf to collect StatsD and host level metrics](#configure-telegraf)
1. [Configure Consul to send metrics to Telegraf](#configure-consul)
1. [Review Consul metrics](#review-consul-metrics)

## Configure Telegraf

Expand All @@ -43,9 +28,8 @@ for this purpose.

You are going to enable some of the most common input plugins to monitor CPU,
memory, disk I/O, networking, and process status, since these are useful for
debugging Consul datacenter issues.

The `telegraf.conf` file starts with global options:
debugging Consul datacenter issues. Here is an example `telegraf.conf` file that
you can use as a starting point:

<CodeBlockConfig filename="telegraf.conf">

Expand All @@ -54,35 +38,11 @@ The `telegraf.conf` file starts with global options:
interval = "10s"
flush_interval = "10s"
omit_hostname = false
```

</CodeBlockConfig>

You set the default collection interval to 10 seconds and ask Telegraf to
include a `host` tag in each metric.

As mentioned above, Telegraf also allows you to set additional tags on the
metrics that pass through it. In this case, you are adding tags for the server
role and datacenter. You can then use these tags in Grafana to filter queries
(for example, to create a dashboard showing only servers with the
`consul-server` role, or only servers in the `us-east-1` datacenter).

<CodeBlockConfig filename="telegraf.conf">

```toml
[global_tags]
role = "consul-server"
datacenter = "us-east-1"
```

</CodeBlockConfig>

Next, set up a StatsD listener on UDP port 8125, with instructions to calculate
percentile metrics and to parse DogStatsD-compatible tags, when they're sent:

<CodeBlockConfig filename="telegraf.conf">

```toml
[[inputs.statsd]]
protocol = "udp"
service_address = ":8125"
Expand All @@ -95,20 +55,7 @@ percentile metrics and to parse DogStatsD-compatible tags, when they're sent:
parse_data_dog_tags = true
allowed_pending_messages = 10000
percentile_limit = 1000
```

</CodeBlockConfig>

The full reference to all the available StatsD-related options in Telegraf is
[here][telegraf-statsd-input].

Now, you can configure inputs for things like CPU, memory, network I/O, and disk
I/O. Most of them don't require any configuration, but make sure the `interfaces`
list in `inputs.net` matches the interface names you get from `ifconfig`.

<CodeBlockConfig filename="telegraf.conf">

```toml
[[inputs.cpu]]
percpu = true
totalcpu = true
Expand Down Expand Up @@ -145,43 +92,32 @@ list in `inputs.net` matches the interface names you get from `ifconfig`.

[[inputs.system]]
# no configuration
```

</CodeBlockConfig>

Another useful plugin is the [procstat][telegraf-procstat-input] plugin, which
reports metrics for processes you select:

<CodeBlockConfig filename="telegraf.conf">

```toml
[[inputs.procstat]]
pattern = "(consul)"

[[inputs.consul]]
address = "localhost:8500"
scheme = "http"
```

</CodeBlockConfig>

Telegraf even includes a [plugin][telegraf-consul-input] that monitors the
health checks associated with the Consul agent, using Consul API to query the
data.
The `telegraf.conf` file starts with global options - you set the default collection interval to 10 seconds and ask Telegraf to include a `host` tag in each metric.

It's important to note: the plugin itself will not report the telemetry, Consul
will report those stats already using StatsD protocol.
Telegraf also allows you to set additional tags on the metrics that pass through it. In this case, you are adding tags for the server role `consul-server` and datacenter `us-east-1`. You can further use these tags in Grafana to filter queries.

<CodeBlockConfig filename="telegraf.conf">
The next config section sets up a StatsD listener on UDP port 8125, with instructions to calculate percentile metrics and to parse DogStatsD-compatible tags. Consul will use this to report telemetry stats. The full reference to all the available StatsD-related options in Telegraf is [here][telegraf-statsd-input].

```toml
[[inputs.consul]]
address = "localhost:8500"
scheme = "http"
```
The next configuration sections are used to configure inputs for things like CPU, memory, network I/O, and disk I/O. It is important to make sure the `interfaces` list in `inputs.net` matches the system interface names.

</CodeBlockConfig>
Another useful input plugin is the [procstat][telegraf-procstat-input] plugin, which reports metrics for a process matching a given pattern. In this case, you are using it to monitor the Consul agent process itself.

Telegraf even includes a [plugin][telegraf-consul-input] that monitors the health checks associated with the Consul agent, using Consul API to query the data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include one more sentence here about applying the configuration to your Telegraf instance.

## Telegraf configuration for Consul
## Configure Consul

Asking Consul to send telemetry to Telegraf is as simple as adding a `telemetry`
section to your agent configuration:
In order for Consul to send telemetry to Telegraf, add a `telemetry` section to your agent configuration with the the hostname and port of the StatsD daemon address:

<CodeTabs heading="Consul agent configuration">

Expand All @@ -203,31 +139,15 @@ telemetry {

</CodeTabs>

You only need to specify two options. The `dogstatsd_addr`
specifies the hostname and port of the StatsD daemon.

Note that the configuration specifies DogStatsD format instead of plain StatsD,
which tells Consul to send [tags][tagging] with each metric. Tags can be used by
Grafana to filter data on your dashboards (for example, displaying only the data
for which `role=consul-server`). Telegraf is compatible with the DogStatsD
format and allows you to add your own tags too.

The second option tells Consul not to insert the hostname in the names of the
metrics it sends to StatsD, since the hostnames will be sent as tags. Without
this option, the single metric `consul.raft.apply` would become multiple
metrics:

```plaintext hideClipboard
consul.server1.raft.apply
consul.server2.raft.apply
consul.server3.raft.apply
```

If you are using a different agent (e.g. Circonus, Statsite, or plain StatsD),
you may want to change this configuration, and you can find the configuration
reference [here][consul-telemetry-config].
The second option instructs Consul not to insert the hostname in the names of the metrics it sends to StatsD, since the hostnames will be sent as tags. If you need to have the hostnames as a part of the metric names, set this to `false`. For example, if `disable_hostname` is set to `false`, `consul.raft.apply` would become `consul.<HOSTNAME>.raft.apply`. For further information, check out find the Consul telemetry configuration reference [here][consul-telemetry-config].

## Visualize Telegraf Consul metrics
## Review Consul metrics

You can use a tool like [Grafana][] or [Chronograf][] to visualize metrics from
Telegraf.
Expand All @@ -236,8 +156,6 @@ Here is an example Grafana dashboard:

![Grafana Consul Datacenter](/img/consul-grafana-screenshot.png 'Grafana Dashboard')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm this image is up to date?


## Metric aggregates and alerting from Telegraf

### Memory usage

| Metric Name | Description |
Expand Down Expand Up @@ -332,24 +250,18 @@ unavailable, as the kernel spends all its time waiting for I/O to complete.

## Next steps

In this tutorial, you learned how to set up Telegraf with Consul to collect
metrics, and considered your options for visualizing, aggregating, and alerting
on those metrics. To learn about other factors (in addition to monitoring) that
you should consider when running Consul in production, check the
[Production Checklist][prod-checklist].
To read further about telemetry in Consul, check the [Consul Agent Telemetry](/consul/docs/monitor/telemetry/agent) and [Consul Dataplane Telemetry](/consul/docs/monitor/telemetry/dataplane) pages.

To learn more about Consul monitoring, alerting and logging, check out the [Consul Monitoring](/consul/docs/monitor) page.

[non_negative_difference]: https://docs.influxdata.com/influxdb/v1.5/query_language/functions/#non-negative-difference
[consul_faq_fds]: /consul/docs/troubleshoot/faq#q-does-consul-require-certain-user-process-resource-limits-
[telegraf_plugin]: https://github.com/influxdata/telegraf/tree/master/plugins/inputs/consul
[telegraf-install]: https://docs.influxdata.com/telegraf/v1.6/introduction/installation/
[telegraf-consul-input]: https://github.com/influxdata/telegraf/tree/release-1.6/plugins/inputs/consul
[telegraf-statsd-input]: https://github.com/influxdata/telegraf/tree/release-1.6/plugins/inputs/statsd
[telegraf-procstat-input]: https://github.com/influxdata/telegraf/tree/release-1.6/plugins/inputs/procstat
[telegraf-input-plugins]: https://docs.influxdata.com/telegraf/v1.6/plugins/inputs/
[tagging]: https://docs.datadoghq.com/getting_started/tagging/
[consul-telemetry-config]: /consul/docs/reference/agent/configuration-file/telemetry
[consul-telemetry-ref]: /consul/docs/reference/agent/telemetry
[telegraf-input-plugins]: https://docs.influxdata.com/telegraf/v1.6/plugins/inputs/
[grafana]: https://www.influxdata.com/partners/grafana/
[chronograf]: https://www.influxdata.com/time-series-platform/chronograf/
[prod-checklist]: /consul/tutorials/production-deploy/production-checklist
Loading
Loading