Skip to content

Getting started

Quentin Manfroi edited this page Dec 4, 2020 · 6 revisions

🔗 Contents

Here is a guide to use detectors from this repository. Keep in mind this collection of detectors is split into separate, based on the same model but different, Terraform modules which all use the the detector resource from the SignalFx provider.

Stack

Before to import a module of detectors you will need to setup a new Terraform stack.

Bootstrap

Here is the very minimal terraform configuration required prior to use a module:

terraform {
  required_providers {
    signalfx = {
      source  = "splunk-terraform/signalfx"
      version = ">= 4.26.4"
    }
  }
  required_version = ">= 0.12.26"
}

provider "signalfx" {
  # `my_token` will be generally retrieved from vault
  auth_token = "my_token"
  # replace `eu0` by your SignalFx realm
  api_url    = "https://api.eu0.signalfx.com"
}

Then, you can init this stack, this will install the provider:

$ terraform init

Initializing the backend...

Initializing provider plugins...
- Finding splunk-terraform/signalfx versions matching ">= 4.26.4"...
- Installing splunk-terraform/signalfx v6.1.0...
- Installed splunk-terraform/signalfx v6.1.0 (signed by a HashiCorp partner, key ID xxx)

Now you can import as many modules as you want for this stack.

Environment

The implementation of the modules is per environment based which is a common required variable.

Even if you can bypass this logic using a generic value like global it is generally recommended to define it from a variable which could be automatically set by a wrapper:

variable "environment" {
  type = string
}

The rest of this documentation assumes this variable exists.

Compose

Please, explore the modules list, pick some modules fitting your needs and follow their instructions to import them. A stack is basically an import of one or multiple modules of detectors.

You have to understand that one module does not always correspond to one "target" to monitor. This is true for a basic use case like nginx for example but sometimes it could be split into multiple modules for different known use cases to provide a maximum of flexibility.

Indeed you can compose your monitoring with one, some or all "fragments" available depending on your situation. There are plenty of reasons to do that but you only need to keep in mind there could have multiple different modules to monitor the same service so you have to check every related modules and choose the one you want.

Modules

This repository is a collection of modules fully independent and autonomous without any "root" module as an entry point. However from the point of view of the Terraform registry a repository is a module which could have (and use) several sub-modules. To match this logic we created a fake root module and all modules are available and listed as sub-modules in this repository registry.

Use

To use a module simply follow its local README.md file. It should contain every required information to setup it correctly in your existing Terraform stack including:

  • a guideline common to every modules
  • some generic resources like related docs but contextualized to the module
  • specific notes and requirements depending on the purpose and dependencies of the module

In general this will as simple as :

module "signalfx-detectors-smart-agent_system-common" {
  source = "github.com/claranet/terraform-signalfx-detectors.git//modules/smart-agent_system-common?ref=master"

  # common vars
  #environment   = var.environment
  #notifications = local.notifications
  # other variables, common or specific to this module
}

In this example, for development and testing purpose we set the source:

  • as Github
  • using master as reference for the latest but not fully qualified source code

For production ready configuration we would prefer the source as:

At this stage you can init to download your modules:

$ terraform init
Initializing modules...
Downloading github.com/claranet/terraform-signalfx-detectors.git?ref=master for signalfx-detectors-smart-agent_system-common...
- signalfx-detectors-smart-agent_system-common in .terraform/modules/signalfx-detectors-smart-agent_system-common/modules/smart-agent_system-common
Downloading github.com/claranet/terraform-signalfx-detectors.git for signalfx-detectors-smart-agent_system-common.filter-tags...
- signalfx-detectors-smart-agent_system-common.filter-tags in .terraform/modules/signalfx-detectors-smart-agent_system-common.filter-tags/common/filter-tags

Configure

The [environment](Variables#environment] and notifications are variables common to all modules as for all others with the scope global.

These two variables are often the only required ones but some modules can propose more specific customization at module level (global) which should be described in its own README.md.

The notifications option is a Terraform object which defines a list of recipients following the notification format:

variable "recipient" {
  type = string
  default = "[email protected]"
}

locals {
  notification = format("Email,%s", var.recipient)
  notifications = {
    critical = [local.notification]
    major    = [local.notification]
    minor    = [local.notification]
    warning  = []
    info     = []
  }
}

module "signalfx-detectors-smart-agent_system-common" {
  source = "github.com/claranet/terraform-signalfx-detectors.git//modules/smart-agent_system-common?ref=master"

  environment   = var.environment
  notifications = local.notifications
}

Please see the notification binding for more information on how to configure notifications for each severity.

At this stage you already should be able to deploy every detectors of your module configured at global level (i.e. notifications send by email except for warning and info severity):

TF_VAR_environment=doc terraform apply -target=module.signalfx-detectors-smart-agent_system-common.signalfx_detector.cpu

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.signalfx-detectors-smart-agent_system-common.signalfx_detector.cpu will be created
  + resource "signalfx_detector" "cpu" {
      + id                = (known after apply)
      + max_delay         = 0
      + name              = "[doc] System cpu utilization"
      + program_text      = <<~EOT
            signal = data('cpu.utilization', filter=filter('env', 'doc') and filter('sfx_monitored', 'true')), extrapolation='zero').min(over='1h').publish('signal')
            detect(when(signal > 90)).publish('CRIT')
            detect(when(signal > 85) and when(signal <= 90)).publish('MAJOR')
        EOT
      + show_data_markers = true
      + time_range        = 3600
      + url               = (known after apply)

      + rule {
          + description           = "is too high > 85""
          + detect_label          = "MAJOR"
          + disabled              = false
          + notifications         = [
              + "Email,[email protected]",
            ]
          + parameterized_body    = <<~EOT
                **Alert**:
                *[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}})*
                {{#if anomalous}}
                **Triggered at**:
                *{{timestamp}}*
                {{else}}
                **Cleared at**:
                *{{timestamp}}*
                {{/if}}
                
                {{#notEmpty dimensions}}
                **Dimensions**:
                *{{{dimensions}}}*
                {{/notEmpty}}
                
                {{#if anomalous}}
                {{#if runbookUrl}}**Runbook**:
                Go to [this page]({{{runbookUrl}}}) for help and analysis.
                {{/if}}
                
                {{#if tip}}**Tip**:
                {{{tip}}}
                {{/if}}
                {{/if}}
            EOT
          + parameterized_subject = "[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}}) on {{{dimensions}}}"
          + severity              = "Major"
        }
      + rule {
          + description           = "is too high > 90""
          + detect_label          = "CRIT"
          + disabled              = false
          + notifications         = [
              + "Email,[email protected]",
            ]
          + parameterized_body    = <<~EOT
                **Alert**:
                *[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}})*
                {{#if anomalous}}
                **Triggered at**:
                *{{timestamp}}*
                {{else}}
                **Cleared at**:
                *{{timestamp}}*
                {{/if}}
                
                {{#notEmpty dimensions}}
                **Dimensions**:
                *{{{dimensions}}}*
                {{/notEmpty}}
                
                {{#if anomalous}}
                {{#if runbookUrl}}**Runbook**:
                Go to [this page]({{{runbookUrl}}}) for help and analysis.
                {{/if}}
                
                {{#if tip}}**Tip**:
                {{{tip}}}
                {{/if}}
                {{/if}}
            EOT
          + parameterized_subject = "[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}}) on {{{dimensions}}}"
          + severity              = "Minor"
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.signalfx-detectors-smart-agent_system-common.signalfx_detector.cpu: Creating...
module.signalfx-detectors-smart-agent_system-common.signalfx_detector.cpu: Creation complete after 1s [id=xxx]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

As you can see the environment configured as doc from TF_VAR_environment env var has been used and is visible in the plan, as for notifications but there are lot more options to customize at the detector (or rule) level.

Note: to keep the output short we used -target option from Terraform apply command to only deploy one detector but in general we want all of them.

Detectors

Now you can configure each detector or their rules. The way is the same than global module configuration described above but with variables from detector and/or rule scopes.

A simple change could be to change a small behavior of the previously deployed detector because it triggers to many alerts which are not relevant in our environment.

So we can, for example, decrease the threshold value of the critical rule for the cpu detector and disable its major rule.

module "signalfx-detectors-smart-agent_system-common" {
  source = "github.com/claranet/terraform-signalfx-detectors.git//modules/smart-agent_system-common?ref=master"

  environment         = var.environment
  notifications       = local.notifications
  cpu_threshold_critical = 95
  cpu_disabled_major  = true
}

Then apply again:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # module.signalfx-detectors-smart-agent_system-common.signalfx_detector.cpu will be updated in-place
  ~ resource "signalfx_detector" "cpu" {
        disable_sampling  = false
        id                = "xxx"
        max_delay         = 0
        name              = "[doc] System cpu utilization"
      ~ program_text      = <<~EOT
            signal = data('cpu.utilization', filter=filter('env', 'doc') and filter('sfx_monitored', 'true'), extrapolation='zero').min(over='1h').publish('signal')
          - detect(when(signal > 90)).publish('CRIT')
          - detect(when(signal > 85) and when(signal <= 90)).publish('MAJOR')
          + detect(when(signal > 95)).publish('CRIT')
          + detect(when(signal > 85) and when(signal <= 95)).publish('MAJOR')
        EOT
        show_data_markers = true
        show_event_lines  = false
        teams             = []
        time_range        = 3600
        url               = "https://app.signalfx.com/#/detector/xxx"

      - rule {
          - description           = "is too high > 85" -> null
          - detect_label          = "MAJOR" -> null
          - disabled              = false -> null
          - notifications         = [
              - "Email,[email protected]",
            ] -> null
          - parameterized_body    = <<~EOT
                **Alert**:
                *[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}})*
                {{#if anomalous}}
                **Triggered at**:
                *{{timestamp}}*
                {{else}}
                **Cleared at**:
                *{{timestamp}}*
                {{/if}}
                
                {{#notEmpty dimensions}}
                **Dimensions**:
                *{{{dimensions}}}*
                {{/notEmpty}}
                
                {{#if anomalous}}
                {{#if runbookUrl}}**Runbook**:
                Go to [this page]({{{runbookUrl}}}) for help and analysis.
                {{/if}}
                
                {{#if tip}}**Tip**:
                {{{tip}}}
                {{/if}}
                {{/if}}
            EOT -> null
          - parameterized_subject = "[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}}) on {{{dimensions}}}" -> null
          - severity              = "Major" -> null
        }
      + rule {
          + description           = "is too high > 85"
          + detect_label          = "MAJOR"
          + disabled              = true
          + notifications         = [
              + "Email,[email protected]",
            ]
          + parameterized_body    = <<~EOT
                **Alert**:
                *[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}})*
                {{#if anomalous}}
                **Triggered at**:
                *{{timestamp}}*
                {{else}}
                **Cleared at**:
                *{{timestamp}}*
                {{/if}}
                
                {{#notEmpty dimensions}}
                **Dimensions**:
                *{{{dimensions}}}*
                {{/notEmpty}}
                
                {{#if anomalous}}
                {{#if runbookUrl}}**Runbook**:
                Go to [this page]({{{runbookUrl}}}) for help and analysis.
                {{/if}}
                
                {{#if tip}}**Tip**:
                {{{tip}}}
                {{/if}}
                {{/if}}
            EOT
          + parameterized_subject = "[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}}) on {{{dimensions}}}"
          + severity              = "Major"
        }
      - rule {
          - description           = "is too high > 90" -> null
          - detect_label          = "CRIT" -> null
          - disabled              = false -> null
          - notifications         = [
              - "Email,[email protected]",
            ] -> null
          - parameterized_body    = <<~EOT
                **Alert**:
                *[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}})*
                {{#if anomalous}}
                **Triggered at**:
                *{{timestamp}}*
                {{else}}
                **Cleared at**:
                *{{timestamp}}*
                {{/if}}
                
                {{#notEmpty dimensions}}
                **Dimensions**:
                *{{{dimensions}}}*
                {{/notEmpty}}
                
                {{#if anomalous}}
                {{#if runbookUrl}}**Runbook**:
                Go to [this page]({{{runbookUrl}}}) for help and analysis.
                {{/if}}
                
                {{#if tip}}**Tip**:
                {{{tip}}}
                {{/if}}
                {{/if}}
            EOT -> null
          - parameterized_subject = "[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}}) on {{{dimensions}}}" -> null
          - severity              = "Critical" -> null
        }
      + rule {
          + description           = "is too high > 95"
          + detect_label          = "CRIT"
          + disabled              = false
          + notifications         = [
              + "Email,[email protected]",
            ]
          + parameterized_body    = <<~EOT
                **Alert**:
                *[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}})*
                {{#if anomalous}}
                **Triggered at**:
                *{{timestamp}}*
                {{else}}
                **Cleared at**:
                *{{timestamp}}*
                {{/if}}
                
                {{#notEmpty dimensions}}
                **Dimensions**:
                *{{{dimensions}}}*
                {{/notEmpty}}
                
                {{#if anomalous}}
                {{#if runbookUrl}}**Runbook**:
                Go to [this page]({{{runbookUrl}}}) for help and analysis.
                {{/if}}
                
                {{#if tip}}**Tip**:
                {{{tip}}}
                {{/if}}
                {{/if}}
            EOT
          + parameterized_subject = "[{{ruleSeverity}}]{{{detectorName}}} {{{readableRule}}} ({{inputs.signal.value}}) on {{{dimensions}}}"
          + severity              = "Critical"
        }
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.signalfx-detectors-smart-agent_system-common.signalfx_detector.cpu: Modifying... [id=xxx]
module.signalfx-detectors-smart-agent_system-common.signalfx_detector.cpu: Modifications complete after 2s [id=xxx]

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

To understand what kind of usual customizations are available or any recommendations on how to configure your detectors in real use, please see the Guidance documentation.

Clone this wiki locally