diff --git a/docs/severity.md b/docs/severity.md index 30be3a0cd..e3736d2c8 100644 --- a/docs/severity.md +++ b/docs/severity.md @@ -77,6 +77,7 @@ - [organization_usage](#organization_usage) - [otel-collector_kubernetes-common](#otel-collector_kubernetes-common) - [prometheus-exporter_active-directory](#prometheus-exporter_active-directory) +- [prometheus-exporter_couchdb](#prometheus-exporter_couchdb) - [prometheus-exporter_docker-state](#prometheus-exporter_docker-state) - [prometheus-exporter_kong](#prometheus-exporter_kong) - [prometheus-exporter_oracledb](#prometheus-exporter_oracledb) diff --git a/modules/prometheus-exporter_couchdb/README.md b/modules/prometheus-exporter_couchdb/README.md new file mode 100644 index 000000000..7352e2709 --- /dev/null +++ b/modules/prometheus-exporter_couchdb/README.md @@ -0,0 +1,157 @@ +# COUCHDB SignalFx detectors + + + +:link: **Contents** + +- [How to use this module?](#how-to-use-this-module) +- [What are the available detectors in this module?](#what-are-the-available-detectors-in-this-module) +- [How to collect required metrics?](#how-to-collect-required-metrics) + - [Examples](#examples) + - [Metrics](#metrics) +- [Related documentation](#related-documentation) + + + +## How to use this module? + +This directory defines a [Terraform](https://www.terraform.io/) +[module](https://www.terraform.io/language/modules/syntax) you can use in your +existing [stack](https://github.com/claranet/terraform-signalfx-detectors/wiki/Getting-started#stack) by adding a +`module` configuration and setting its `source` parameter to URL of this folder: + +```hcl +module "signalfx-detectors-prometheus-exporter-couchdb" { + source = "github.com/hlepesant/terraform-signalfx-detectors.git//modules/prometheus-exporter_couchdb?ref={revision}" + + environment = var.environment + notifications = local.notifications +} +``` + +Note the following parameters: + +* `source`: Use this parameter to specify the URL of the module. The double slash (`//`) is intentional and required. + Terraform uses it to specify subfolders within a Git repo (see [module + sources](https://www.terraform.io/language/modules/sources)). The `ref` parameter specifies a specific Git tag in + this repository. It is recommended to use the latest "pinned" version in place of `{revision}`. Avoid using a branch + like `master` except for testing purpose. Note that every modules in this repository are available on the Terraform + [registry](https://registry.terraform.io/modules/claranet/detectors/signalfx) and we recommend using it as source + instead of `git` which is more flexible but less future-proof. + +* `environment`: Use this parameter to specify the + [environment](https://github.com/claranet/terraform-signalfx-detectors/wiki/Getting-started#environment) used by this + instance of the module. + Its value will be added to the `prefixes` list at the start of the [detector + name](https://github.com/claranet/terraform-signalfx-detectors/wiki/Templating#example). + In general, it will also be used in the `filtering` internal sub-module to [apply + filters](https://github.com/claranet/terraform-signalfx-detectors/wiki/Guidance#filtering) based on our default + [tagging convention](https://github.com/claranet/terraform-signalfx-detectors/wiki/Tagging-convention) by default. + +* `notifications`: Use this parameter to define where alerts should be sent depending on their severity. It consists + of a Terraform [object](https://www.terraform.io/language/expressions/type-constraints#object) where each key represents an available + [detector rule severity](https://docs.splunk.com/observability/alerts-detectors-notifications/create-detectors-for-alerts.html#severity) + and its value is a list of recipients. Every recipients must respect the [detector notification + format](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector#notification-format). + Check the [notification binding](https://github.com/claranet/terraform-signalfx-detectors/wiki/Notifications-binding) + documentation to understand the recommended role of each severity. + +These 3 parameters alongs with all variables defined in [common-variables.tf](common-variables.tf) are common to all +[modules](../) in this repository. Other variables, specific to this module, are available in +[variables-gen.tf](variables-gen.tf). +In general, the default configuration "works" but all of these Terraform +[variables](https://www.terraform.io/language/values/variables) make it possible to +customize the detectors behavior to better fit your needs. + +Most of them represent usual tips and rules detailled in the +[guidance](https://github.com/claranet/terraform-signalfx-detectors/wiki/Guidance) documentation and listed in the +common [variables](https://github.com/claranet/terraform-signalfx-detectors/wiki/Variables) dedicated documentation. + +Feel free to explore the [wiki](https://github.com/claranet/terraform-signalfx-detectors/wiki) for more information about +general usage of this repository. + +## What are the available detectors in this module? + +This module creates the following SignalFx detectors which could contain one or multiple alerting rules: + +|Detector|Critical|Major|Minor|Warning|Info| +|---|---|---|---|---|---| +|Couchdb heartbeat|X|-|-|-|-| +|Couchdb couchdb_httpd_status_code_4xx|X|X|-|-|-| +|Couchdb couchdb_auth_cache|X|X|-|-|-| +|Couchdb couchdb_couch_replicator_jobs|X|X|-|-|-| +|Couchdb couchdb_erlang_processes|X|X|-|-|-| +|Couchdb cluster_is_stable|X|-|-|-|-| + +## How to collect required metrics? + +This module deploys detectors using metrics reported by the +scraping of a server following the [OpenMetrics convention](https://openmetrics.io/) based on and compatible with [the Prometheus +exposition format](https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md#openmetrics-text-format). + +They are generally called `Prometheus Exporters` which can be fetched by both the [SignalFx Smart Agent](https://github.com/signalfx/signalfx-agent) +thanks to its [prometheus exporter monitor](https://github.com/signalfx/signalfx-agent/blob/main/docs/monitors/prometheus-exporter.md) and the +[OpenTelemetry Collector](https://github.com/signalfx/splunk-otel-collector) using its [prometheus +receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/prometheusreceiver) or its derivates. + +These exporters could be embedded directly in the tool you want to monitor (e.g. nginx ingress) or must be installed next to it as +a separate program configured to connect, create metrics and expose them as server. + + +Check the [Related documentation](#related-documentation) section for more detailed and specific information about this module dependencies. + +The detectors of this module uses metrics from the [embedded Couchdb exporter](https://docs.couchdb.org/en/stable/config/misc.html#configuration-of-prometheus-endpoint) for Prometheus. + +### Examples + +Sample OTEL Agent configuration. + +```yaml +--- +receivers: + prometheus: + config: + scrape_configs: + - job_name: "couchdb-prometheus" + scrape_interval: 60s + metrics_path: "/_node/_local/_prometheus" + basic_auth: + username: "{{ couchdb_admin_username }}" + password: "{{ couchdb_admin_password }}" + static_configs: + - targets: ["{{ inventory_hostname }}:5984"] + labels: + environment: "{{ splunk_otel_collector_env }}" +service: + pipelines: + metrics: + receivers: + - prometheus +... +``` + + +### Metrics + + +Here is the list of required metrics for detectors in this module. + +* `couchdb_auth_cache_misses_total` +* `couchdb_auth_cache_requests_total` +* `couchdb_couch_replicator_cluster_is_stable` +* `couchdb_couch_replicator_jobs_crashed` +* `couchdb_couch_replicator_jobs_total` +* `couchdb_erlang_processes` +* `couchdb_erlang_process_limit` +* `couchdb_httpd_status_codes` +* `couchdb_uptime_seconds` + + + + +## Related documentation + +* [Terraform SignalFx provider](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs) +* [Terraform SignalFx detector](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector) +* [Splunk Observability integrations](https://docs.splunk.com/Observability/gdi/get-data-in/integrations.html) +* [Configuration of Prometheus Endpoint in CouchDB](https://docs.couchdb.org/en/stable/config/misc.html#configuration-of-prometheus-endpoint) diff --git a/modules/prometheus-exporter_couchdb/common-filters.tf b/modules/prometheus-exporter_couchdb/common-filters.tf new file mode 100644 index 000000000..ba770a5bd --- /dev/null +++ b/modules/prometheus-exporter_couchdb/common-filters.tf @@ -0,0 +1,4 @@ +locals { + filters = "filter('env', '${var.environment}') and filter('sfx_monitored', 'true')" +} + diff --git a/modules/prometheus-exporter_couchdb/common-locals.tf b/modules/prometheus-exporter_couchdb/common-locals.tf new file mode 100644 index 000000000..51a7650c1 --- /dev/null +++ b/modules/prometheus-exporter_couchdb/common-locals.tf @@ -0,0 +1,44 @@ +locals { + heartbeat_auto_resolve_after = "1s" + not_running_vm_filters_gcp = "(not filter('gcp_status', '{Code=3, Name=STOPPING}', '{Code=4, Name=TERMINATED}'))" + not_running_vm_filters_aws = "(not filter('aws_state', '{Code: 32,Name: shutting-down}', '{Code: 48,Name: terminated}', '{Code: 64,Name: stopping}', '{Code: 80,Name: stopped}'))" + not_running_vm_filters_azure = "(not filter('azure_power_state', 'PowerState/stopping', 'PowerState/stopped', 'PowerState/deallocating', 'PowerState/deallocated'))" + not_running_vm_filters = format( + "%s and %s and %s", + local.not_running_vm_filters_aws, + local.not_running_vm_filters_gcp, + local.not_running_vm_filters_azure + ) + detector_name_prefix = "${join("", formatlist("[%s]", var.prefixes))}[${var.environment}]" + common_tags = concat(["terraform", var.environment], var.teams) + rule_subject_prefix = "[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}}" + rule_subject_suffix = "on {{{dimensions}}}" + rule_subject = format("%s ({{inputs.signal.value}}) %s", local.rule_subject_prefix, local.rule_subject_suffix) + rule_subject_novalue = format("%s %s", local.rule_subject_prefix, local.rule_subject_suffix) + rule_body = <<-EOF + **Alert**: + *[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}})* + {{#if anomalous}} + **Triggered at**: + *{{timestamp}}* + {{else}} + **Cleared at**: + *{{timestamp}}* + {{/if}} + + {{#notEmpty dimensions}} + **Dimensions**: + *{{{dimensions}}}* + {{/notEmpty}} + + {{#if anomalous}} + {{#if runbookUrl}}**Runbook**: + Go to [this page]({{{runbookUrl}}}) for help and analysis. + {{/if}} + + {{#if tip}}**Tip**: + {{{tip}}} + {{/if}} + {{/if}} +EOF +} diff --git a/modules/prometheus-exporter_couchdb/common-modules.tf b/modules/prometheus-exporter_couchdb/common-modules.tf new file mode 100644 index 000000000..b995457c1 --- /dev/null +++ b/modules/prometheus-exporter_couchdb/common-modules.tf @@ -0,0 +1,8 @@ +module "filtering" { + source = "github.com/claranet/terraform-signalfx-detectors.git//modules/internal_filtering?ref=v1.26.0" + + filtering_default = local.filters + filtering_custom = var.filtering_custom + append_mode = var.filtering_append +} + diff --git a/modules/prometheus-exporter_couchdb/common-variables.tf b/modules/prometheus-exporter_couchdb/common-variables.tf new file mode 100644 index 000000000..80cc77eee --- /dev/null +++ b/modules/prometheus-exporter_couchdb/common-variables.tf @@ -0,0 +1,78 @@ +# Global + +variable "environment" { + description = "Infrastructure environment" + type = string +} + +variable "notifications" { + description = "Default notification recipients list per severity" + type = object({ + critical = list(string) + major = list(string) + minor = list(string) + warning = list(string) + info = list(string) + }) +} + +variable "prefixes" { + description = "Prefixes list to prepend between brackets on every monitors names before environment" + type = list(string) + default = [] +} + +variable "filtering_custom" { + description = "Filters as SignalFlow string to either replace or append to default filtering convention which is the only one used if not defined" + type = string + default = null +} + +variable "filtering_append" { + description = "If true, the `filtering_custom` string will be appended to the default filtering convention instead of fully replace it" + type = bool + default = false +} + +variable "detectors_disabled" { + description = "Disable all detectors in this module" + type = bool + default = false +} + +variable "runbook_url" { + description = "Default runbook URL to apply to all detectors (if not overridden at detector level)" + type = string + default = "" +} + +variable "authorized_writer_teams" { + description = "List of teams IDs authorized (with admins) to edit the detector. If defined, it requires an user token to work" + type = list(string) + default = null +} + +variable "teams" { + description = "List of teams IDs to associate the detector to" + type = list(string) + default = [] +} + +variable "message_subject" { + description = "The subject to use in alerting rules messages which overrides the default template" + type = string + default = "" +} + +variable "message_body" { + description = "The body to use in alerting rules messages which overrides the default template" + type = string + default = "" +} + +variable "extra_tags" { + description = "List of tags to add to the detectors resources, useful to find detectors " + type = list(string) + default = [] +} + diff --git a/modules/prometheus-exporter_couchdb/common-versions.tf b/modules/prometheus-exporter_couchdb/common-versions.tf new file mode 100644 index 000000000..d77818c04 --- /dev/null +++ b/modules/prometheus-exporter_couchdb/common-versions.tf @@ -0,0 +1,9 @@ +terraform { + required_providers { + signalfx = { + source = "splunk-terraform/signalfx" + version = ">= 7.0.0" + } + } + required_version = ">= 0.12.26" +} diff --git a/modules/prometheus-exporter_couchdb/conf/00-heartbeat.yaml b/modules/prometheus-exporter_couchdb/conf/00-heartbeat.yaml new file mode 100644 index 000000000..3be7be964 --- /dev/null +++ b/modules/prometheus-exporter_couchdb/conf/00-heartbeat.yaml @@ -0,0 +1,13 @@ +## Example +module: couchdb +name: heartbeat + +transformation: false +aggregation: true +exclude_not_running_vm: true + +signals: + signal: + metric: "couchdb_uptime_seconds" +rules: + critical: diff --git a/modules/prometheus-exporter_couchdb/conf/01-httpd_status_codes.yaml b/modules/prometheus-exporter_couchdb/conf/01-httpd_status_codes.yaml new file mode 100644 index 000000000..aef3b9928 --- /dev/null +++ b/modules/prometheus-exporter_couchdb/conf/01-httpd_status_codes.yaml @@ -0,0 +1,24 @@ +module: couchdb +name: couchdb_httpd_status_code_4xx + +transformation: ".mean(over='10m')" +aggregation: ".sum(by=['sf_metric'])" + +signals: + A: + metric: couchdb_httpd_status_codes + filter: "(filter('code', '400', '401', '403', '404', '405', '406', '409', '412', '413', '414', '415', '416', '417'))" + rollup: sum + B: + metric: couchdb_httpd_status_codes + rollup: sum + signal: + formula: ((A/B).scale(100)) +rules: + critical: + threshold: 30 + comparator: '>' + major: + threshold: 20 + comparator: '>' + dependency: critical diff --git a/modules/prometheus-exporter_couchdb/conf/02-auth_cache.yaml b/modules/prometheus-exporter_couchdb/conf/02-auth_cache.yaml new file mode 100644 index 000000000..52cfaba20 --- /dev/null +++ b/modules/prometheus-exporter_couchdb/conf/02-auth_cache.yaml @@ -0,0 +1,21 @@ +module: couchdb +name: couchdb_auth_cache + +transformation: ".mean(over='10m')" +aggregation: ".sum(by=['sf_metric'])" + +signals: + misses_total: + metric: couchdb_auth_cache_misses_total + request_total: + metric: couchdb_auth_cache_requests_total + signal: + formula: ((misses_total/request_total).scale(100)) +rules: + critical: + threshold: 10 + comparator: '>' + major: + threshold: 5 + comparator: '>' + dependency: critical diff --git a/modules/prometheus-exporter_couchdb/conf/03-couch_replicator_jobs.yaml b/modules/prometheus-exporter_couchdb/conf/03-couch_replicator_jobs.yaml new file mode 100644 index 000000000..32c17e30e --- /dev/null +++ b/modules/prometheus-exporter_couchdb/conf/03-couch_replicator_jobs.yaml @@ -0,0 +1,21 @@ +module: couchdb +name: couchdb_couch_replicator_jobs + +transformation: ".mean(over='10m')" +aggregation: ".sum(by=['sf_metric'])" + +signals: + jobs_crashed: + metric: couchdb_couch_replicator_jobs_crashed + jobs_total: + metric: couchdb_couch_replicator_jobs_total + signal: + formula: ((jobs_crashed/jobs_total).scale(100)) +rules: + critical: + threshold: 10 + comparator: '>' + major: + threshold: 5 + comparator: '>' + dependency: critical diff --git a/modules/prometheus-exporter_couchdb/conf/04-erlang_processes.yaml b/modules/prometheus-exporter_couchdb/conf/04-erlang_processes.yaml new file mode 100644 index 000000000..0d4413414 --- /dev/null +++ b/modules/prometheus-exporter_couchdb/conf/04-erlang_processes.yaml @@ -0,0 +1,21 @@ +module: couchdb +name: couchdb_erlang_processes + +transformation: ".mean(over='10m')" +aggregation: ".sum(by=['sf_metric'])" + +signals: + erlang_processes: + metric: couchdb_erlang_processes + erlang_process_limit: + metric: couchdb_erlang_process_limit + signal: + formula: ((erlang_processes/erlang_process_limit).scale(100)) +rules: + critical: + threshold: 90 + comparator: '>' + major: + threshold: 80 + comparator: '>' + dependency: critical diff --git a/modules/prometheus-exporter_couchdb/conf/05-cluster_is_stable.yaml b/modules/prometheus-exporter_couchdb/conf/05-cluster_is_stable.yaml new file mode 100644 index 000000000..bb65fccc9 --- /dev/null +++ b/modules/prometheus-exporter_couchdb/conf/05-cluster_is_stable.yaml @@ -0,0 +1,14 @@ +module: couchdb +name: cluster_is_stable + +transformation: false +aggregation: true +exclude_not_running_vm: true + +signals: + signal: + metric: couchdb_couch_replicator_cluster_is_stable +rules: + critical: + threshold: 1 + comparator: "<" diff --git a/modules/prometheus-exporter_couchdb/conf/readme.yaml b/modules/prometheus-exporter_couchdb/conf/readme.yaml new file mode 100644 index 000000000..e75c23533 --- /dev/null +++ b/modules/prometheus-exporter_couchdb/conf/readme.yaml @@ -0,0 +1,34 @@ +documentations: + - name: Configuration of Prometheus Endpoint in CouchDB + url: https://docs.couchdb.org/en/stable/config/misc.html#configuration-of-prometheus-endpoint + +source_doc: | + The detectors of this module uses metrics from the [embedded Couchdb exporter](https://docs.couchdb.org/en/stable/config/misc.html#configuration-of-prometheus-endpoint) for Prometheus. + + ### Examples + + Sample OTEL Agent configuration. + + ```yaml + --- + receivers: + prometheus: + config: + scrape_configs: + - job_name: "couchdb-prometheus" + scrape_interval: 60s + metrics_path: "/_node/_local/_prometheus" + basic_auth: + username: "{{ couchdb_admin_username }}" + password: "{{ couchdb_admin_password }}" + static_configs: + - targets: ["{{ inventory_hostname }}:5984"] + labels: + environment: "{{ splunk_otel_collector_env }}" + service: + pipelines: + metrics: + receivers: + - prometheus + ... + ``` diff --git a/modules/prometheus-exporter_couchdb/detectors-gen.tf b/modules/prometheus-exporter_couchdb/detectors-gen.tf new file mode 100644 index 000000000..288d51af5 --- /dev/null +++ b/modules/prometheus-exporter_couchdb/detectors-gen.tf @@ -0,0 +1,223 @@ +resource "signalfx_detector" "heartbeat" { + name = format("%s %s", local.detector_name_prefix, "Couchdb heartbeat") + + authorized_writer_teams = var.authorized_writer_teams + teams = try(coalescelist(var.teams, var.authorized_writer_teams), null) + tags = compact(concat(local.common_tags, local.tags, var.extra_tags)) + + program_text = <<-EOF + from signalfx.detectors.not_reporting import not_reporting + signal = data('couchdb_uptime_seconds', filter=${local.not_running_vm_filters} and ${module.filtering.signalflow})${var.heartbeat_aggregation_function}.publish('signal') + not_reporting.detector(stream=signal, resource_identifier=None, duration='${var.heartbeat_timeframe}', auto_resolve_after='${local.heartbeat_auto_resolve_after}').publish('CRIT') +EOF + + rule { + description = "has not reported in ${var.heartbeat_timeframe}" + severity = "Critical" + detect_label = "CRIT" + disabled = coalesce(var.heartbeat_disabled, var.detectors_disabled) + notifications = try(coalescelist(lookup(var.heartbeat_notifications, "critical", []), var.notifications.critical), null) + runbook_url = try(coalesce(var.heartbeat_runbook_url, var.runbook_url), "") + tip = var.heartbeat_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject_novalue : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + max_delay = var.heartbeat_max_delay +} + +resource "signalfx_detector" "couchdb_httpd_status_code_4xx" { + name = format("%s %s", local.detector_name_prefix, "Couchdb couchdb_httpd_status_code_4xx") + + authorized_writer_teams = var.authorized_writer_teams + teams = try(coalescelist(var.teams, var.authorized_writer_teams), null) + tags = compact(concat(local.common_tags, local.tags, var.extra_tags)) + + program_text = <<-EOF + A = data('couchdb_httpd_status_codes', filter=(filter('code', '400', '401', '403', '404', '405', '406', '409', '412', '413', '414', '415', '416', '417')) and ${module.filtering.signalflow}, rollup='sum')${var.couchdb_httpd_status_code_4xx_aggregation_function}${var.couchdb_httpd_status_code_4xx_transformation_function} + B = data('couchdb_httpd_status_codes', filter=${module.filtering.signalflow}, rollup='sum')${var.couchdb_httpd_status_code_4xx_aggregation_function}${var.couchdb_httpd_status_code_4xx_transformation_function} + signal = ((A/B).scale(100)).publish('signal') + detect(when(signal > ${var.couchdb_httpd_status_code_4xx_threshold_critical}, lasting=%{if var.couchdb_httpd_status_code_4xx_lasting_duration_critical == null}None%{else}'${var.couchdb_httpd_status_code_4xx_lasting_duration_critical}'%{endif}, at_least=${var.couchdb_httpd_status_code_4xx_at_least_percentage_critical})).publish('CRIT') + detect(when(signal > ${var.couchdb_httpd_status_code_4xx_threshold_major}, lasting=%{if var.couchdb_httpd_status_code_4xx_lasting_duration_major == null}None%{else}'${var.couchdb_httpd_status_code_4xx_lasting_duration_major}'%{endif}, at_least=${var.couchdb_httpd_status_code_4xx_at_least_percentage_major}) and (not when(signal > ${var.couchdb_httpd_status_code_4xx_threshold_critical}, lasting=%{if var.couchdb_httpd_status_code_4xx_lasting_duration_critical == null}None%{else}'${var.couchdb_httpd_status_code_4xx_lasting_duration_critical}'%{endif}, at_least=${var.couchdb_httpd_status_code_4xx_at_least_percentage_critical}))).publish('MAJOR') +EOF + + rule { + description = "is too high > ${var.couchdb_httpd_status_code_4xx_threshold_critical}" + severity = "Critical" + detect_label = "CRIT" + disabled = coalesce(var.couchdb_httpd_status_code_4xx_disabled_critical, var.couchdb_httpd_status_code_4xx_disabled, var.detectors_disabled) + notifications = try(coalescelist(lookup(var.couchdb_httpd_status_code_4xx_notifications, "critical", []), var.notifications.critical), null) + runbook_url = try(coalesce(var.couchdb_httpd_status_code_4xx_runbook_url, var.runbook_url), "") + tip = var.couchdb_httpd_status_code_4xx_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + rule { + description = "is too high > ${var.couchdb_httpd_status_code_4xx_threshold_major}" + severity = "Major" + detect_label = "MAJOR" + disabled = coalesce(var.couchdb_httpd_status_code_4xx_disabled_major, var.couchdb_httpd_status_code_4xx_disabled, var.detectors_disabled) + notifications = try(coalescelist(lookup(var.couchdb_httpd_status_code_4xx_notifications, "major", []), var.notifications.major), null) + runbook_url = try(coalesce(var.couchdb_httpd_status_code_4xx_runbook_url, var.runbook_url), "") + tip = var.couchdb_httpd_status_code_4xx_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + max_delay = var.couchdb_httpd_status_code_4xx_max_delay +} + +resource "signalfx_detector" "couchdb_auth_cache" { + name = format("%s %s", local.detector_name_prefix, "Couchdb couchdb_auth_cache") + + authorized_writer_teams = var.authorized_writer_teams + teams = try(coalescelist(var.teams, var.authorized_writer_teams), null) + tags = compact(concat(local.common_tags, local.tags, var.extra_tags)) + + program_text = <<-EOF + misses_total = data('couchdb_auth_cache_misses_total', filter=${module.filtering.signalflow})${var.couchdb_auth_cache_aggregation_function}${var.couchdb_auth_cache_transformation_function} + request_total = data('couchdb_auth_cache_requests_total', filter=${module.filtering.signalflow})${var.couchdb_auth_cache_aggregation_function}${var.couchdb_auth_cache_transformation_function} + signal = ((misses_total/request_total).scale(100)).publish('signal') + detect(when(signal > ${var.couchdb_auth_cache_threshold_critical}, lasting=%{if var.couchdb_auth_cache_lasting_duration_critical == null}None%{else}'${var.couchdb_auth_cache_lasting_duration_critical}'%{endif}, at_least=${var.couchdb_auth_cache_at_least_percentage_critical})).publish('CRIT') + detect(when(signal > ${var.couchdb_auth_cache_threshold_major}, lasting=%{if var.couchdb_auth_cache_lasting_duration_major == null}None%{else}'${var.couchdb_auth_cache_lasting_duration_major}'%{endif}, at_least=${var.couchdb_auth_cache_at_least_percentage_major}) and (not when(signal > ${var.couchdb_auth_cache_threshold_critical}, lasting=%{if var.couchdb_auth_cache_lasting_duration_critical == null}None%{else}'${var.couchdb_auth_cache_lasting_duration_critical}'%{endif}, at_least=${var.couchdb_auth_cache_at_least_percentage_critical}))).publish('MAJOR') +EOF + + rule { + description = "is too high > ${var.couchdb_auth_cache_threshold_critical}" + severity = "Critical" + detect_label = "CRIT" + disabled = coalesce(var.couchdb_auth_cache_disabled_critical, var.couchdb_auth_cache_disabled, var.detectors_disabled) + notifications = try(coalescelist(lookup(var.couchdb_auth_cache_notifications, "critical", []), var.notifications.critical), null) + runbook_url = try(coalesce(var.couchdb_auth_cache_runbook_url, var.runbook_url), "") + tip = var.couchdb_auth_cache_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + rule { + description = "is too high > ${var.couchdb_auth_cache_threshold_major}" + severity = "Major" + detect_label = "MAJOR" + disabled = coalesce(var.couchdb_auth_cache_disabled_major, var.couchdb_auth_cache_disabled, var.detectors_disabled) + notifications = try(coalescelist(lookup(var.couchdb_auth_cache_notifications, "major", []), var.notifications.major), null) + runbook_url = try(coalesce(var.couchdb_auth_cache_runbook_url, var.runbook_url), "") + tip = var.couchdb_auth_cache_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + max_delay = var.couchdb_auth_cache_max_delay +} + +resource "signalfx_detector" "couchdb_couch_replicator_jobs" { + name = format("%s %s", local.detector_name_prefix, "Couchdb couchdb_couch_replicator_jobs") + + authorized_writer_teams = var.authorized_writer_teams + teams = try(coalescelist(var.teams, var.authorized_writer_teams), null) + tags = compact(concat(local.common_tags, local.tags, var.extra_tags)) + + program_text = <<-EOF + jobs_crashed = data('couchdb_couch_replicator_jobs_crashed', filter=${module.filtering.signalflow})${var.couchdb_couch_replicator_jobs_aggregation_function}${var.couchdb_couch_replicator_jobs_transformation_function} + jobs_total = data('couchdb_couch_replicator_jobs_total', filter=${module.filtering.signalflow})${var.couchdb_couch_replicator_jobs_aggregation_function}${var.couchdb_couch_replicator_jobs_transformation_function} + signal = ((jobs_crashed/jobs_total).scale(100)).publish('signal') + detect(when(signal > ${var.couchdb_couch_replicator_jobs_threshold_critical}, lasting=%{if var.couchdb_couch_replicator_jobs_lasting_duration_critical == null}None%{else}'${var.couchdb_couch_replicator_jobs_lasting_duration_critical}'%{endif}, at_least=${var.couchdb_couch_replicator_jobs_at_least_percentage_critical})).publish('CRIT') + detect(when(signal > ${var.couchdb_couch_replicator_jobs_threshold_major}, lasting=%{if var.couchdb_couch_replicator_jobs_lasting_duration_major == null}None%{else}'${var.couchdb_couch_replicator_jobs_lasting_duration_major}'%{endif}, at_least=${var.couchdb_couch_replicator_jobs_at_least_percentage_major}) and (not when(signal > ${var.couchdb_couch_replicator_jobs_threshold_critical}, lasting=%{if var.couchdb_couch_replicator_jobs_lasting_duration_critical == null}None%{else}'${var.couchdb_couch_replicator_jobs_lasting_duration_critical}'%{endif}, at_least=${var.couchdb_couch_replicator_jobs_at_least_percentage_critical}))).publish('MAJOR') +EOF + + rule { + description = "is too high > ${var.couchdb_couch_replicator_jobs_threshold_critical}" + severity = "Critical" + detect_label = "CRIT" + disabled = coalesce(var.couchdb_couch_replicator_jobs_disabled_critical, var.couchdb_couch_replicator_jobs_disabled, var.detectors_disabled) + notifications = try(coalescelist(lookup(var.couchdb_couch_replicator_jobs_notifications, "critical", []), var.notifications.critical), null) + runbook_url = try(coalesce(var.couchdb_couch_replicator_jobs_runbook_url, var.runbook_url), "") + tip = var.couchdb_couch_replicator_jobs_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + rule { + description = "is too high > ${var.couchdb_couch_replicator_jobs_threshold_major}" + severity = "Major" + detect_label = "MAJOR" + disabled = coalesce(var.couchdb_couch_replicator_jobs_disabled_major, var.couchdb_couch_replicator_jobs_disabled, var.detectors_disabled) + notifications = try(coalescelist(lookup(var.couchdb_couch_replicator_jobs_notifications, "major", []), var.notifications.major), null) + runbook_url = try(coalesce(var.couchdb_couch_replicator_jobs_runbook_url, var.runbook_url), "") + tip = var.couchdb_couch_replicator_jobs_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + max_delay = var.couchdb_couch_replicator_jobs_max_delay +} + +resource "signalfx_detector" "couchdb_erlang_processes" { + name = format("%s %s", local.detector_name_prefix, "Couchdb couchdb_erlang_processes") + + authorized_writer_teams = var.authorized_writer_teams + teams = try(coalescelist(var.teams, var.authorized_writer_teams), null) + tags = compact(concat(local.common_tags, local.tags, var.extra_tags)) + + program_text = <<-EOF + erlang_processes = data('couchdb_erlang_processes', filter=${module.filtering.signalflow})${var.couchdb_erlang_processes_aggregation_function}${var.couchdb_erlang_processes_transformation_function} + erlang_process_limit = data('couchdb_erlang_process_limit', filter=${module.filtering.signalflow})${var.couchdb_erlang_processes_aggregation_function}${var.couchdb_erlang_processes_transformation_function} + signal = ((erlang_processes/erlang_process_limit).scale(100)).publish('signal') + detect(when(signal > ${var.couchdb_erlang_processes_threshold_critical}, lasting=%{if var.couchdb_erlang_processes_lasting_duration_critical == null}None%{else}'${var.couchdb_erlang_processes_lasting_duration_critical}'%{endif}, at_least=${var.couchdb_erlang_processes_at_least_percentage_critical})).publish('CRIT') + detect(when(signal > ${var.couchdb_erlang_processes_threshold_major}, lasting=%{if var.couchdb_erlang_processes_lasting_duration_major == null}None%{else}'${var.couchdb_erlang_processes_lasting_duration_major}'%{endif}, at_least=${var.couchdb_erlang_processes_at_least_percentage_major}) and (not when(signal > ${var.couchdb_erlang_processes_threshold_critical}, lasting=%{if var.couchdb_erlang_processes_lasting_duration_critical == null}None%{else}'${var.couchdb_erlang_processes_lasting_duration_critical}'%{endif}, at_least=${var.couchdb_erlang_processes_at_least_percentage_critical}))).publish('MAJOR') +EOF + + rule { + description = "is too high > ${var.couchdb_erlang_processes_threshold_critical}" + severity = "Critical" + detect_label = "CRIT" + disabled = coalesce(var.couchdb_erlang_processes_disabled_critical, var.couchdb_erlang_processes_disabled, var.detectors_disabled) + notifications = try(coalescelist(lookup(var.couchdb_erlang_processes_notifications, "critical", []), var.notifications.critical), null) + runbook_url = try(coalesce(var.couchdb_erlang_processes_runbook_url, var.runbook_url), "") + tip = var.couchdb_erlang_processes_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + rule { + description = "is too high > ${var.couchdb_erlang_processes_threshold_major}" + severity = "Major" + detect_label = "MAJOR" + disabled = coalesce(var.couchdb_erlang_processes_disabled_major, var.couchdb_erlang_processes_disabled, var.detectors_disabled) + notifications = try(coalescelist(lookup(var.couchdb_erlang_processes_notifications, "major", []), var.notifications.major), null) + runbook_url = try(coalesce(var.couchdb_erlang_processes_runbook_url, var.runbook_url), "") + tip = var.couchdb_erlang_processes_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + max_delay = var.couchdb_erlang_processes_max_delay +} + +resource "signalfx_detector" "cluster_is_stable" { + name = format("%s %s", local.detector_name_prefix, "Couchdb cluster_is_stable") + + authorized_writer_teams = var.authorized_writer_teams + teams = try(coalescelist(var.teams, var.authorized_writer_teams), null) + tags = compact(concat(local.common_tags, local.tags, var.extra_tags)) + + program_text = <<-EOF + signal = data('couchdb_couch_replicator_cluster_is_stable', filter=${local.not_running_vm_filters} and ${module.filtering.signalflow})${var.cluster_is_stable_aggregation_function}.publish('signal') + detect(when(signal < ${var.cluster_is_stable_threshold_critical}, lasting=%{if var.cluster_is_stable_lasting_duration_critical == null}None%{else}'${var.cluster_is_stable_lasting_duration_critical}'%{endif}, at_least=${var.cluster_is_stable_at_least_percentage_critical})).publish('CRIT') +EOF + + rule { + description = "is too low < ${var.cluster_is_stable_threshold_critical}" + severity = "Critical" + detect_label = "CRIT" + disabled = coalesce(var.cluster_is_stable_disabled, var.detectors_disabled) + notifications = try(coalescelist(lookup(var.cluster_is_stable_notifications, "critical", []), var.notifications.critical), null) + runbook_url = try(coalesce(var.cluster_is_stable_runbook_url, var.runbook_url), "") + tip = var.cluster_is_stable_tip + parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject + parameterized_body = var.message_body == "" ? local.rule_body : var.message_body + } + + max_delay = var.cluster_is_stable_max_delay +} + diff --git a/modules/prometheus-exporter_couchdb/outputs.tf b/modules/prometheus-exporter_couchdb/outputs.tf new file mode 100644 index 000000000..38dabc907 --- /dev/null +++ b/modules/prometheus-exporter_couchdb/outputs.tf @@ -0,0 +1,30 @@ +output "cluster_is_stable" { + description = "Detector resource for cluster_is_stable" + value = signalfx_detector.cluster_is_stable +} + +output "couchdb_auth_cache" { + description = "Detector resource for couchdb_auth_cache" + value = signalfx_detector.couchdb_auth_cache +} + +output "couchdb_couch_replicator_jobs" { + description = "Detector resource for couchdb_couch_replicator_jobs" + value = signalfx_detector.couchdb_couch_replicator_jobs +} + +output "couchdb_erlang_processes" { + description = "Detector resource for couchdb_erlang_processes" + value = signalfx_detector.couchdb_erlang_processes +} + +output "couchdb_httpd_status_code_4xx" { + description = "Detector resource for couchdb_httpd_status_code_4xx" + value = signalfx_detector.couchdb_httpd_status_code_4xx +} + +output "heartbeat" { + description = "Detector resource for heartbeat" + value = signalfx_detector.heartbeat +} + diff --git a/modules/prometheus-exporter_couchdb/tags.tf b/modules/prometheus-exporter_couchdb/tags.tf new file mode 100644 index 000000000..cfae1bf2d --- /dev/null +++ b/modules/prometheus-exporter_couchdb/tags.tf @@ -0,0 +1,4 @@ +locals { + tags = ["prometheus-exporter", "couchdb"] +} + diff --git a/modules/prometheus-exporter_couchdb/variables-gen.tf b/modules/prometheus-exporter_couchdb/variables-gen.tf new file mode 100644 index 000000000..802f2e8b6 --- /dev/null +++ b/modules/prometheus-exporter_couchdb/variables-gen.tf @@ -0,0 +1,459 @@ +# heartbeat detector + +variable "heartbeat_notifications" { + description = "Notification recipients list per severity overridden for heartbeat detector" + type = map(list(string)) + default = {} +} + +variable "heartbeat_aggregation_function" { + description = "Aggregation function and group by for heartbeat detector (i.e. \".mean(by=['host'])\")" + type = string + default = "" +} + +variable "heartbeat_max_delay" { + description = "Enforce max delay for heartbeat detector (use \"0\" or \"null\" for \"Auto\")" + type = number + default = 900 +} + +variable "heartbeat_tip" { + description = "Suggested first course of action or any note useful for incident handling" + type = string + default = "" +} + +variable "heartbeat_runbook_url" { + description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause" + type = string + default = "" +} + +variable "heartbeat_disabled" { + description = "Disable all alerting rules for heartbeat detector" + type = bool + default = null +} + +variable "heartbeat_timeframe" { + description = "Timeframe for heartbeat detector (i.e. \"10m\")" + type = string + default = "10m" +} + +# couchdb_httpd_status_code_4xx detector + +variable "couchdb_httpd_status_code_4xx_notifications" { + description = "Notification recipients list per severity overridden for couchdb_httpd_status_code_4xx detector" + type = map(list(string)) + default = {} +} + +variable "couchdb_httpd_status_code_4xx_aggregation_function" { + description = "Aggregation function and group by for couchdb_httpd_status_code_4xx detector (i.e. \".mean(by=['host'])\")" + type = string + default = ".sum(by=['sf_metric'])" +} + +variable "couchdb_httpd_status_code_4xx_transformation_function" { + description = "Transformation function for couchdb_httpd_status_code_4xx detector (i.e. \".mean(over='5m')\")" + type = string + default = ".mean(over='10m')" +} + +variable "couchdb_httpd_status_code_4xx_max_delay" { + description = "Enforce max delay for couchdb_httpd_status_code_4xx detector (use \"0\" or \"null\" for \"Auto\")" + type = number + default = null +} + +variable "couchdb_httpd_status_code_4xx_tip" { + description = "Suggested first course of action or any note useful for incident handling" + type = string + default = "" +} + +variable "couchdb_httpd_status_code_4xx_runbook_url" { + description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause" + type = string + default = "" +} + +variable "couchdb_httpd_status_code_4xx_disabled" { + description = "Disable all alerting rules for couchdb_httpd_status_code_4xx detector" + type = bool + default = null +} + +variable "couchdb_httpd_status_code_4xx_disabled_critical" { + description = "Disable critical alerting rule for couchdb_httpd_status_code_4xx detector" + type = bool + default = null +} + +variable "couchdb_httpd_status_code_4xx_disabled_major" { + description = "Disable major alerting rule for couchdb_httpd_status_code_4xx detector" + type = bool + default = null +} + +variable "couchdb_httpd_status_code_4xx_threshold_critical" { + description = "Critical threshold for couchdb_httpd_status_code_4xx detector" + type = number + default = 30 +} + +variable "couchdb_httpd_status_code_4xx_lasting_duration_critical" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "couchdb_httpd_status_code_4xx_at_least_percentage_critical" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +variable "couchdb_httpd_status_code_4xx_threshold_major" { + description = "Major threshold for couchdb_httpd_status_code_4xx detector" + type = number + default = 20 +} + +variable "couchdb_httpd_status_code_4xx_lasting_duration_major" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "couchdb_httpd_status_code_4xx_at_least_percentage_major" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +# couchdb_auth_cache detector + +variable "couchdb_auth_cache_notifications" { + description = "Notification recipients list per severity overridden for couchdb_auth_cache detector" + type = map(list(string)) + default = {} +} + +variable "couchdb_auth_cache_aggregation_function" { + description = "Aggregation function and group by for couchdb_auth_cache detector (i.e. \".mean(by=['host'])\")" + type = string + default = ".sum(by=['sf_metric'])" +} + +variable "couchdb_auth_cache_transformation_function" { + description = "Transformation function for couchdb_auth_cache detector (i.e. \".mean(over='5m')\")" + type = string + default = ".mean(over='10m')" +} + +variable "couchdb_auth_cache_max_delay" { + description = "Enforce max delay for couchdb_auth_cache detector (use \"0\" or \"null\" for \"Auto\")" + type = number + default = null +} + +variable "couchdb_auth_cache_tip" { + description = "Suggested first course of action or any note useful for incident handling" + type = string + default = "" +} + +variable "couchdb_auth_cache_runbook_url" { + description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause" + type = string + default = "" +} + +variable "couchdb_auth_cache_disabled" { + description = "Disable all alerting rules for couchdb_auth_cache detector" + type = bool + default = null +} + +variable "couchdb_auth_cache_disabled_critical" { + description = "Disable critical alerting rule for couchdb_auth_cache detector" + type = bool + default = null +} + +variable "couchdb_auth_cache_disabled_major" { + description = "Disable major alerting rule for couchdb_auth_cache detector" + type = bool + default = null +} + +variable "couchdb_auth_cache_threshold_critical" { + description = "Critical threshold for couchdb_auth_cache detector" + type = number + default = 10 +} + +variable "couchdb_auth_cache_lasting_duration_critical" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "couchdb_auth_cache_at_least_percentage_critical" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +variable "couchdb_auth_cache_threshold_major" { + description = "Major threshold for couchdb_auth_cache detector" + type = number + default = 5 +} + +variable "couchdb_auth_cache_lasting_duration_major" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "couchdb_auth_cache_at_least_percentage_major" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +# couchdb_couch_replicator_jobs detector + +variable "couchdb_couch_replicator_jobs_notifications" { + description = "Notification recipients list per severity overridden for couchdb_couch_replicator_jobs detector" + type = map(list(string)) + default = {} +} + +variable "couchdb_couch_replicator_jobs_aggregation_function" { + description = "Aggregation function and group by for couchdb_couch_replicator_jobs detector (i.e. \".mean(by=['host'])\")" + type = string + default = ".sum(by=['sf_metric'])" +} + +variable "couchdb_couch_replicator_jobs_transformation_function" { + description = "Transformation function for couchdb_couch_replicator_jobs detector (i.e. \".mean(over='5m')\")" + type = string + default = ".mean(over='10m')" +} + +variable "couchdb_couch_replicator_jobs_max_delay" { + description = "Enforce max delay for couchdb_couch_replicator_jobs detector (use \"0\" or \"null\" for \"Auto\")" + type = number + default = null +} + +variable "couchdb_couch_replicator_jobs_tip" { + description = "Suggested first course of action or any note useful for incident handling" + type = string + default = "" +} + +variable "couchdb_couch_replicator_jobs_runbook_url" { + description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause" + type = string + default = "" +} + +variable "couchdb_couch_replicator_jobs_disabled" { + description = "Disable all alerting rules for couchdb_couch_replicator_jobs detector" + type = bool + default = null +} + +variable "couchdb_couch_replicator_jobs_disabled_critical" { + description = "Disable critical alerting rule for couchdb_couch_replicator_jobs detector" + type = bool + default = null +} + +variable "couchdb_couch_replicator_jobs_disabled_major" { + description = "Disable major alerting rule for couchdb_couch_replicator_jobs detector" + type = bool + default = null +} + +variable "couchdb_couch_replicator_jobs_threshold_critical" { + description = "Critical threshold for couchdb_couch_replicator_jobs detector" + type = number + default = 10 +} + +variable "couchdb_couch_replicator_jobs_lasting_duration_critical" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "couchdb_couch_replicator_jobs_at_least_percentage_critical" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +variable "couchdb_couch_replicator_jobs_threshold_major" { + description = "Major threshold for couchdb_couch_replicator_jobs detector" + type = number + default = 5 +} + +variable "couchdb_couch_replicator_jobs_lasting_duration_major" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "couchdb_couch_replicator_jobs_at_least_percentage_major" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +# couchdb_erlang_processes detector + +variable "couchdb_erlang_processes_notifications" { + description = "Notification recipients list per severity overridden for couchdb_erlang_processes detector" + type = map(list(string)) + default = {} +} + +variable "couchdb_erlang_processes_aggregation_function" { + description = "Aggregation function and group by for couchdb_erlang_processes detector (i.e. \".mean(by=['host'])\")" + type = string + default = ".sum(by=['sf_metric'])" +} + +variable "couchdb_erlang_processes_transformation_function" { + description = "Transformation function for couchdb_erlang_processes detector (i.e. \".mean(over='5m')\")" + type = string + default = ".mean(over='10m')" +} + +variable "couchdb_erlang_processes_max_delay" { + description = "Enforce max delay for couchdb_erlang_processes detector (use \"0\" or \"null\" for \"Auto\")" + type = number + default = null +} + +variable "couchdb_erlang_processes_tip" { + description = "Suggested first course of action or any note useful for incident handling" + type = string + default = "" +} + +variable "couchdb_erlang_processes_runbook_url" { + description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause" + type = string + default = "" +} + +variable "couchdb_erlang_processes_disabled" { + description = "Disable all alerting rules for couchdb_erlang_processes detector" + type = bool + default = null +} + +variable "couchdb_erlang_processes_disabled_critical" { + description = "Disable critical alerting rule for couchdb_erlang_processes detector" + type = bool + default = null +} + +variable "couchdb_erlang_processes_disabled_major" { + description = "Disable major alerting rule for couchdb_erlang_processes detector" + type = bool + default = null +} + +variable "couchdb_erlang_processes_threshold_critical" { + description = "Critical threshold for couchdb_erlang_processes detector" + type = number + default = 90 +} + +variable "couchdb_erlang_processes_lasting_duration_critical" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "couchdb_erlang_processes_at_least_percentage_critical" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +variable "couchdb_erlang_processes_threshold_major" { + description = "Major threshold for couchdb_erlang_processes detector" + type = number + default = 80 +} + +variable "couchdb_erlang_processes_lasting_duration_major" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "couchdb_erlang_processes_at_least_percentage_major" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +} +# cluster_is_stable detector + +variable "cluster_is_stable_notifications" { + description = "Notification recipients list per severity overridden for cluster_is_stable detector" + type = map(list(string)) + default = {} +} + +variable "cluster_is_stable_aggregation_function" { + description = "Aggregation function and group by for cluster_is_stable detector (i.e. \".mean(by=['host'])\")" + type = string + default = "" +} + +variable "cluster_is_stable_max_delay" { + description = "Enforce max delay for cluster_is_stable detector (use \"0\" or \"null\" for \"Auto\")" + type = number + default = null +} + +variable "cluster_is_stable_tip" { + description = "Suggested first course of action or any note useful for incident handling" + type = string + default = "" +} + +variable "cluster_is_stable_runbook_url" { + description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause" + type = string + default = "" +} + +variable "cluster_is_stable_disabled" { + description = "Disable all alerting rules for cluster_is_stable detector" + type = bool + default = null +} + +variable "cluster_is_stable_threshold_critical" { + description = "Critical threshold for cluster_is_stable detector" + type = number + default = 1 +} + +variable "cluster_is_stable_lasting_duration_critical" { + description = "Minimum duration that conditions must be true before raising alert" + type = string + default = null +} + +variable "cluster_is_stable_at_least_percentage_critical" { + description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)" + type = number + default = 1 +}