-
Notifications
You must be signed in to change notification settings - Fork 32
Getting started
🔗 Contents
Here is a guide to use detectors from this repository. Keep in mind this collection of detectors is split into separate, based on the same model but different, Terraform modules which all use the the detector resource from the SignalFx provider.
Before to import a module of detectors you will need to setup a new Terraform stack.
Here is the very minimal terraform configuration required prior to use a module:
terraform {
required_providers {
signalfx = {
source = "splunk-terraform/signalfx"
version = ">= 4.26.4"
}
}
required_version = ">= 0.12.26"
}
provider "signalfx" {
# `my_token` will be generally retrieved from vault
auth_token = "my_token"
# replace `eu0` by your SignalFx realm
api_url = "https://api.eu0.signalfx.com"
}
Then, you can init
this stack, this will install the provider:
$ terraform init
Initializing the backend...
Initializing provider plugins...
- Finding splunk-terraform/signalfx versions matching ">= 4.26.4"...
- Installing splunk-terraform/signalfx v6.1.0...
- Installed splunk-terraform/signalfx v6.1.0 (signed by a HashiCorp partner, key ID xxx)
Now you can import as many modules as you want for this stack.
The implementation of the modules is per environment
based which is a common required
variable.
Even if you can bypass this logic using a generic value like global
it is generally
recommended to define it from a variable which could be automatically set by a
wrapper:
variable "environment" {
type = string
}
The rest of this documentation assumes this variable exists.
Please, explore the modules list, pick some modules fitting your needs and follow their instructions to import them. A stack is basically an import of one or multiple modules of detectors.
You have to understand that one module does not always correspond to one "target" to monitor.
This is true for a basic use case like nginx
for example but sometimes it could be split
into multiple modules for different known use cases to provide a maximum of flexibility.
Indeed you can compose your monitoring with one, some or all "fragments" available depending on your situation. There are plenty of reasons to do that but you only need to keep in mind there could have multiple different modules to monitor the same service so you have to check every related modules and choose the one you want.
This repository is a collection of modules fully independent and autonomous
without any "root" module as an entry point.
However from the point of view of the Terraform registry
a repository is a module which could have (and use) several sub-modules.
To match this logic we created a fake root module and all modules
are available and listed as sub-modules
in this repository
registry.
To use a module simply follow its local README.md file. It should contain every required information to setup it correctly in your existing Terraform stack including:
- a guideline common to every modules
- some generic resources like related docs but contextualized to the module
- specific notes and requirements depending on the purpose and dependencies of the module
In general this will as simple as :
module "signalfx-detectors-smart-agent_system-common" {
source = "github.com/claranet/terraform-signalfx-detectors.git//modules/smart-agent_system-common?ref=master"
# common vars
#environment = var.environment
#notifications = local.notifications
# other variables, common or specific to this module
}
In this example, for development and testing purpose we set the source:
- as Github
- using
master
as reference for the latest but not fully qualified source code
For production ready configuration we would prefer the source as:
- as Terraform Registry
- using desired
tag
as reference
At this stage you can init
to download your modules:
$ terraform init
Initializing modules...
Downloading github.com/claranet/terraform-signalfx-detectors.git?ref=master for signalfx-detectors-smart-agent_system-common...
- signalfx-detectors-smart-agent_system-common in .terraform/modules/signalfx-detectors-smart-agent_system-common/modules/smart-agent_system-common
Downloading github.com/claranet/terraform-signalfx-detectors.git for signalfx-detectors-smart-agent_system-common.filter-tags...
- signalfx-detectors-smart-agent_system-common.filter-tags in .terraform/modules/signalfx-detectors-smart-agent_system-common.filter-tags/common/filter-tags
The [environment](Variables#environment] and notifications are
variables common to all modules as for all others with the scope
global
.
These two variables are often the only required ones but some modules can propose more
specific customization at module level (global
) which should be described in its own
README.md.
The notifications
option is a Terraform
object which defines a
list of recipients
following the notification
format:
variable "recipient" {
type = string
default = "[email protected]"
}
locals {
notification = format("Email,%s", var.recipient)
notifications = {
critical = [local.notification]
major = [local.notification]
minor = [local.notification]
warning = []
info = []
}
}
module "signalfx-detectors-smart-agent_system-common" {
source = "github.com/claranet/terraform-signalfx-detectors.git//modules/smart-agent_system-common?ref=master"
environment = var.environment
notifications = local.notifications
}
Please see the notification binding for more information on how to configure notifications for each severity.
At this stage you already should be able to deploy every detectors of your module
configured at global
level (i.e. notifications send by email except for warning
and info
severity):
TF_VAR_environment=doc terraform apply -target=module.signalfx-detectors-smart-agent_system-common.signalfx_detector.cpu
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# module.signalfx-detectors-smart-agent_system-common.signalfx_detector.cpu will be created
+ resource "signalfx_detector" "cpu" {
+ id = (known after apply)
+ max_delay = 0
+ name = "[doc] System cpu utilization"
+ program_text = <<~EOT
signal = data('cpu.utilization', filter=filter('env', 'doc') and filter('sfx_monitored', 'true')), extrapolation='zero').min(over='1h').publish('signal')
detect(when(signal > 90)).publish('CRIT')
detect(when(signal > 85) and when(signal <= 90)).publish('MAJOR')
EOT
+ show_data_markers = true
+ time_range = 3600
+ url = (known after apply)
+ rule {
+ description = "is too high > 85""
+ detect_label = "MAJOR"
+ disabled = false
+ notifications = [
+ "Email,[email protected]",
]
+ parameterized_body = <<~EOT
**Alert**:
*[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}})*
{{#if anomalous}}
**Triggered at**:
*{{timestamp}}*
{{else}}
**Cleared at**:
*{{timestamp}}*
{{/if}}
{{#notEmpty dimensions}}
**Dimensions**:
*{{{dimensions}}}*
{{/notEmpty}}
{{#if anomalous}}
{{#if runbookUrl}}**Runbook**:
Go to [this page]({{{runbookUrl}}}) for help and analysis.
{{/if}}
{{#if tip}}**Tip**:
{{{tip}}}
{{/if}}
{{/if}}
EOT
+ parameterized_subject = "[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}}) on {{{dimensions}}}"
+ severity = "Major"
}
+ rule {
+ description = "is too high > 90""
+ detect_label = "CRIT"
+ disabled = false
+ notifications = [
+ "Email,[email protected]",
]
+ parameterized_body = <<~EOT
**Alert**:
*[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}})*
{{#if anomalous}}
**Triggered at**:
*{{timestamp}}*
{{else}}
**Cleared at**:
*{{timestamp}}*
{{/if}}
{{#notEmpty dimensions}}
**Dimensions**:
*{{{dimensions}}}*
{{/notEmpty}}
{{#if anomalous}}
{{#if runbookUrl}}**Runbook**:
Go to [this page]({{{runbookUrl}}}) for help and analysis.
{{/if}}
{{#if tip}}**Tip**:
{{{tip}}}
{{/if}}
{{/if}}
EOT
+ parameterized_subject = "[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}}) on {{{dimensions}}}"
+ severity = "Minor"
}
}
Plan: 1 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
module.signalfx-detectors-smart-agent_system-common.signalfx_detector.cpu: Creating...
module.signalfx-detectors-smart-agent_system-common.signalfx_detector.cpu: Creation complete after 1s [id=xxx]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
As you can see the environment
configured as doc
from TF_VAR_environment
env var has been
used and is visible in the plan
, as for notifications
but there are lot more options to
customize at the detector (or rule) level.
Note: to keep the output short we used -target
option from Terraform
apply command to only
deploy one detector but in general we want all of them.
Now you can configure each detector or their rules. The way is the same than global
module
configuration described above but with variables from detector
and/or rule
scopes.
A simple change could be to change a small behavior of the previously deployed detector because it triggers to many alerts which are not relevant in our environment.
So we can, for example, decrease the threshold value of the critical
rule for the
cpu
detector and disable its major
rule.
module "signalfx-detectors-smart-agent_system-common" {
source = "github.com/claranet/terraform-signalfx-detectors.git//modules/smart-agent_system-common?ref=master"
environment = var.environment
notifications = local.notifications
cpu_threshold_critical = 95
cpu_disabled_major = true
}
Then apply
again:
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# module.signalfx-detectors-smart-agent_system-common.signalfx_detector.cpu will be updated in-place
~ resource "signalfx_detector" "cpu" {
disable_sampling = false
id = "xxx"
max_delay = 0
name = "[doc] System cpu utilization"
~ program_text = <<~EOT
signal = data('cpu.utilization', filter=filter('env', 'doc') and filter('sfx_monitored', 'true'), extrapolation='zero').min(over='1h').publish('signal')
- detect(when(signal > 90)).publish('CRIT')
- detect(when(signal > 85) and when(signal <= 90)).publish('MAJOR')
+ detect(when(signal > 95)).publish('CRIT')
+ detect(when(signal > 85) and when(signal <= 95)).publish('MAJOR')
EOT
show_data_markers = true
show_event_lines = false
teams = []
time_range = 3600
url = "https://app.signalfx.com/#/detector/xxx"
- rule {
- description = "is too high > 85" -> null
- detect_label = "MAJOR" -> null
- disabled = false -> null
- notifications = [
- "Email,[email protected]",
] -> null
- parameterized_body = <<~EOT
**Alert**:
*[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}})*
{{#if anomalous}}
**Triggered at**:
*{{timestamp}}*
{{else}}
**Cleared at**:
*{{timestamp}}*
{{/if}}
{{#notEmpty dimensions}}
**Dimensions**:
*{{{dimensions}}}*
{{/notEmpty}}
{{#if anomalous}}
{{#if runbookUrl}}**Runbook**:
Go to [this page]({{{runbookUrl}}}) for help and analysis.
{{/if}}
{{#if tip}}**Tip**:
{{{tip}}}
{{/if}}
{{/if}}
EOT -> null
- parameterized_subject = "[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}}) on {{{dimensions}}}" -> null
- severity = "Major" -> null
}
+ rule {
+ description = "is too high > 85"
+ detect_label = "MAJOR"
+ disabled = true
+ notifications = [
+ "Email,[email protected]",
]
+ parameterized_body = <<~EOT
**Alert**:
*[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}})*
{{#if anomalous}}
**Triggered at**:
*{{timestamp}}*
{{else}}
**Cleared at**:
*{{timestamp}}*
{{/if}}
{{#notEmpty dimensions}}
**Dimensions**:
*{{{dimensions}}}*
{{/notEmpty}}
{{#if anomalous}}
{{#if runbookUrl}}**Runbook**:
Go to [this page]({{{runbookUrl}}}) for help and analysis.
{{/if}}
{{#if tip}}**Tip**:
{{{tip}}}
{{/if}}
{{/if}}
EOT
+ parameterized_subject = "[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}}) on {{{dimensions}}}"
+ severity = "Major"
}
- rule {
- description = "is too high > 90" -> null
- detect_label = "CRIT" -> null
- disabled = false -> null
- notifications = [
- "Email,[email protected]",
] -> null
- parameterized_body = <<~EOT
**Alert**:
*[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}})*
{{#if anomalous}}
**Triggered at**:
*{{timestamp}}*
{{else}}
**Cleared at**:
*{{timestamp}}*
{{/if}}
{{#notEmpty dimensions}}
**Dimensions**:
*{{{dimensions}}}*
{{/notEmpty}}
{{#if anomalous}}
{{#if runbookUrl}}**Runbook**:
Go to [this page]({{{runbookUrl}}}) for help and analysis.
{{/if}}
{{#if tip}}**Tip**:
{{{tip}}}
{{/if}}
{{/if}}
EOT -> null
- parameterized_subject = "[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}}) on {{{dimensions}}}" -> null
- severity = "Critical" -> null
}
+ rule {
+ description = "is too high > 95"
+ detect_label = "CRIT"
+ disabled = false
+ notifications = [
+ "Email,[email protected]",
]
+ parameterized_body = <<~EOT
**Alert**:
*[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}})*
{{#if anomalous}}
**Triggered at**:
*{{timestamp}}*
{{else}}
**Cleared at**:
*{{timestamp}}*
{{/if}}
{{#notEmpty dimensions}}
**Dimensions**:
*{{{dimensions}}}*
{{/notEmpty}}
{{#if anomalous}}
{{#if runbookUrl}}**Runbook**:
Go to [this page]({{{runbookUrl}}}) for help and analysis.
{{/if}}
{{#if tip}}**Tip**:
{{{tip}}}
{{/if}}
{{/if}}
EOT
+ parameterized_subject = "[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}}) on {{{dimensions}}}"
+ severity = "Critical"
}
}
Plan: 0 to add, 1 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
module.signalfx-detectors-smart-agent_system-common.signalfx_detector.cpu: Modifying... [id=xxx]
module.signalfx-detectors-smart-agent_system-common.signalfx_detector.cpu: Modifications complete after 2s [id=xxx]
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
To understand what kind of usual customizations are available or any recommendations on how to configure your detectors in real use, please see the Guidance documentation.