Skip to content

Commit

Permalink
integration_gcp-compute-engine: fix disk_throttled_bps & disk_throttl… (
Browse files Browse the repository at this point in the history
#272)

Co-authored-by: Quentin Manfroi <[email protected]>
  • Loading branch information
tchernomax and xp-1000 authored Apr 9, 2021
1 parent 80f40ac commit b414c92
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 12 deletions.
32 changes: 27 additions & 5 deletions modules/integration_gcp-compute-engine/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
- [How to collect required metrics?](#how-to-collect-required-metrics)
- [Metrics](#metrics)
- [Notes](#notes)
- [Metadata configuration for default filtering](#metadata-configuration-for-default-filtering)
- [About disk detectors](#about-disk-detectors)
- [Related documentation](#related-documentation)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->
Expand Down Expand Up @@ -109,14 +111,34 @@ Here is the list of required metrics for detectors in this module.

## Notes

While SignalFx does not support `label` sync from GCE the default filtering policy relies on `metadata` instead:
### Metadata configuration for default filtering

* add metadata at the instance (or project) level, e.g.:
While SignalFx does not support `label` sync from GCE the default filtering policy relies on `metadata` instead.
Therefore, if you keep the default filter (if you don't define `filter_custom_includes` or `filter_custom_excludes`) you **need** to add those metadata to your GCP computes instances :

- `gcloud compute instances add-metadata myinstance --zone=europe-west1-c --metadata sfx_env=true`
- `gcloud compute instances add-metadata myinstance --zone=europe-west1-c --metadata sfx_monitored=true`
* sfx_env=true
* sfx_monitored=true

* whitelist metadata fields at the SignalFx's GCP integration level (requires SignalFx terraform provider v4.22.0+, see https://docs.signalfx.com/en/latest/integrations/google-cloud-platform.html#compute-engine-instance).
For example:

* via gcloud, at the instance level:
```
gcloud compute instances add-metadata myinstance --zone=europe-west1-c --metadata sfx_env=true
gcloud compute instances add-metadata myinstance --zone=europe-west1-c --metadata sfx_monitored=true
```
* via terraform, [at the instance level](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#metadata)
* via terraform, [at the project level](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_project_metadata)

You also **need** to check if those metadata are in the metadata whitelist in your [SignalFx GCP integration](https://docs.signalfx.com/en/latest/integrations/google-cloud-platform.html#compute-engine-instance).

### About disk detectors

Detectors "GCP GCE Instance disk .." defines an explicit aggregation function by default in contrast to other detectors. It is because underlying metrics about throttle can
add optional `throttle_reason` dimension which will not exist on not throttle related metrics (used to calculate the percentage). To make them match we have to group on dimensions
__common__ between both metrics.
So, if you want to overwrite `disk_throttled_bps_aggregation_function` or `disk_throttled_ops_notifications` take care to keep the aggregation by 'instance_name' and 'device_name', else you might break the detector.

Notice these detectors has a `device_name` dimension in addition to `instance_name` compared to other detectors because it's possible to have two alertes on the same instance if this instance have two disks.


## Related documentation
Expand Down
29 changes: 24 additions & 5 deletions modules/integration_gcp-compute-engine/conf/readme.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,31 @@ documentations:
url: 'https://cloud.google.com/monitoring/api/metrics_gcp#gcp-compute'

notes: |
While SignalFx does not support `label` sync from GCE the default filtering policy relies on `metadata` instead:
### Metadata configuration for default filtering
* add metadata at the instance (or project) level, e.g.:
While SignalFx does not support `label` sync from GCE the default filtering policy relies on `metadata` instead.
Therefore, if you keep the default filter (if you don't define `filter_custom_includes` or `filter_custom_excludes`) you **need** to add those metadata to your GCP computes instances :
- `gcloud compute instances add-metadata myinstance --zone=europe-west1-c --metadata sfx_env=true`
- `gcloud compute instances add-metadata myinstance --zone=europe-west1-c --metadata sfx_monitored=true`
* sfx_env=true
* sfx_monitored=true
* whitelist metadata fields at the SignalFx's GCP integration level (requires SignalFx terraform provider v4.22.0+, see https://docs.signalfx.com/en/latest/integrations/google-cloud-platform.html#compute-engine-instance).
For example:
* via gcloud, at the instance level:
```
gcloud compute instances add-metadata myinstance --zone=europe-west1-c --metadata sfx_env=true
gcloud compute instances add-metadata myinstance --zone=europe-west1-c --metadata sfx_monitored=true
```
* via terraform, [at the instance level](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#metadata)
* via terraform, [at the project level](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_project_metadata)
You also **need** to check if those metadata are in the metadata whitelist in your [SignalFx GCP integration](https://docs.signalfx.com/en/latest/integrations/google-cloud-platform.html#compute-engine-instance).
### About disk detectors
Detectors "GCP GCE Instance disk .." defines an explicit aggregation function by default in contrast to other detectors. It is because underlying metrics about throttle can
add optional `throttle_reason` dimension which will not exist on not throttle related metrics (used to calculate the percentage). To make them match we have to group on dimensions
__common__ between both metrics.
So, if you want to overwrite `disk_throttled_bps_aggregation_function` or `disk_throttled_ops_notifications` take care to keep the aggregation by 'instance_name' and 'device_name', else you might break the detector.
Notice these detectors has a `device_name` dimension in addition to `instance_name` compared to other detectors because it's possible to have two alertes on the same instance if this instance have two disks.
4 changes: 2 additions & 2 deletions modules/integration_gcp-compute-engine/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ variable "disk_throttled_bps_notifications" {
variable "disk_throttled_bps_aggregation_function" {
description = "Aggregation function and group by for disk_throttled_bps detector (i.e. \".mean(by=['host'])\")"
type = string
default = ""
default = ".sum(by=['instance_name', 'device_name'])"
}

variable "disk_throttled_bps_transformation_function" {
Expand Down Expand Up @@ -203,7 +203,7 @@ variable "disk_throttled_ops_notifications" {
variable "disk_throttled_ops_aggregation_function" {
description = "Aggregation function and group by for disk_throttled_ops detector (i.e. \".mean(by=['host'])\")"
type = string
default = ""
default = ".sum(by=['instance_name', 'device_name'])"
}

variable "disk_throttled_ops_transformation_function" {
Expand Down

0 comments on commit b414c92

Please sign in to comment.