Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add windowsevent stage loki process #2545

Merged
merged 10 commits into from
Feb 4, 2025
Merged

Conversation

wildum
Copy link
Contributor

@wildum wildum commented Jan 27, 2025

PR Description

The existing eventlogmessage stage has a few parsing flaws that cannot be addressed without breaking changes (see the issue linked).

This is why I decided to create a new stage "windowsevent" which covers the same functionality and has the same arguments as the existing eventlogmessage, except that it parses the message differently.

New parsing logic:

  • The windowsevent stage expects the message to be structured in sections that are split by empty lines.

  • The first section of the input is treated as a whole block and stored in the extracted map with the key Description.

  • Sections following the Description are expected to contain key-value pairs in the format key: value.

  • If the first line of a section has no value (e.g., "Subject:"), the key will act as a prefix for subsequent keys in the same section.

  • If a line within a section does not include the : symbol, it is considered part of the previous entry's value. The line is appended to the previous value, separated by a comma.

  • Lines in a section without a preceding valid entry (key-value pair) are ignored and discarded.

I scrolled through Windows events on my personal computer to get some examples. You can check the example in the doc and in the tests to see the results.

Which issue(s) this PR fixes

Fixes #2337

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • [NA] Config converters updated

@wildum wildum requested review from clayton-cornell and a team as code owners January 27, 2025 15:21
Copy link
Contributor

github-actions bot commented Jan 27, 2025

💻 Deploy preview deleted.

@wildum wildum force-pushed the add-windowsevent-stage-loki-process branch from 7c88f59 to 24b6f7b Compare January 27, 2025 16:16
Copy link
Collaborator

@mattdurham mattdurham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments but overall looks solid.

Copy link
Collaborator

@mattdurham mattdurham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Nachtfalkeaw
Copy link

Hello,

this is my windows eventlog processing pipeline:

  • I do separate the event_log channels with "service_name" because this fits into "Explore Logs" app
  • I try to change the existing "level" and "levelText" values to map the "level" value. sometimes logs have this, sometimes not. sometimes english, sometimes german. At the end all logs have/should have the "level" field.

Not sure if this helps or not.

loki.source.windowsevent "application"  {
    eventlog_name = "Application"
    use_incoming_timestamp = true

    labels = {
      "service_name"  = "windows_eventlog",
      "channel"       = "Application",
    }

    forward_to = [loki.relabel.windows_event_level.receiver]
}


//
//=======================================================================================
//


loki.source.windowsevent "security"  {
    eventlog_name = "Security"
    use_incoming_timestamp = true

    labels = {
      "service_name"  = "windows_eventlog",
      "channel"       = "Security",
    }

    forward_to = [loki.relabel.windows_event_level.receiver]
}


//
//=======================================================================================
//


loki.source.windowsevent "setup"  {
    eventlog_name = "Setup"
    use_incoming_timestamp = true

    labels = {
      "service_name"  = "windows_eventlog",
      "channel"       = "Setup",
    }

    forward_to = [loki.relabel.windows_event_level.receiver]
}

//
//=======================================================================================
//


loki.source.windowsevent "system"  {
    eventlog_name = "System"
    use_incoming_timestamp = true

    labels = {
      "service_name"  = "windows_eventlog",
      "channel"       = "System",
    }

    forward_to = [loki.relabel.windows_event_level.receiver]
}


//
//=======================================================================================
//


loki.relabel "windows_event_level" {
// if "level" label is empty or does not exist create it and set value "tmp_level"
  rule {
    action        = "replace"
    source_labels = ["level"]
    regex         = "^$"
    replacement   = "tmp_level"
    target_label  = "level"
  }

  forward_to = [loki.process.windows_eventlog.receiver]

}


//
//=======================================================================================
//


loki.process "windows_eventlog" {

  stage.json {
      expressions = {
        source            = "",
        channel           = "",
        computer          = "",
        event_id          = "",
        levelText         = "",
        level             = "",
        opCodeText        = "",
        keywords          = "",
        timeCreated       = "",
        eventRecordID     = "",
        event_data        = "",
        user_data         = "",
        message           = "",
        task              = "",
        taskText          = "",
        version           = "",
        opCode            = "",
        execution         = "",
        processId         = "execution.\"processId\"",
        threadId          = "execution.\"threadId\"",
        processName       = "execution.\"processName\"",
        security          = "",
        userId            = "security.\"userId\"",
        userName          = "security.\"userName\"",

      }
  }



// sometimes windows level values are numbers. we convert it to strings
// sometimes messages do not have "level" at all. for that we created a label "level" with value "tmp_level" in loki.relabel.windows_event_level before.
// if "tmp_level" is set we use the level information from "levelText". In my case it is german and I translate it to the loki know english naming.
  stage.template {
      source   = "level"
      template = `{{- $level := .Value -}}
                  {{- if eq $level "0" -}}debug
                  {{- else if eq $level "1" -}}critical
                  {{- else if eq $level "2" -}}error
                  {{- else if eq $level "3" -}}warn
                  {{- else if eq $level "4" -}}info
                  {{- else if eq $level "5" -}}trace
                  {{- else if eq $level "tmp_level" -}}{{- .levelText -}}
                  {{- else if eq .levelText "Information" -}}info
                  {{- else if eq .levelText "Informationen" -}}info
                  {{- else if eq .levelText "Warning" -}}warn
                  {{- else if eq .levelText "Warnung" -}}warn
                  {{- else if eq .levelText "Fehler" -}}error
                  {{- else if eq .levelText "Kritisch" -}}critical
                  {{- else if eq .levelText nil -}}unknown
                  {{- else -}}{{- .levelText -}}{{- end -}}`
  }

  stage.labels {
    values = {
      level       = "",
      channel     = "",
    }
  }



            // everything we do not need as label as struchtured_metadata
  stage.structured_metadata {
    values = {
        source            = "",
        //channel         = "",
        computer          = "",
        event_id          = "",
        //level           = "",
        levelText         = "",
        opCodeText        = "",
        keywords          = "",
        timeCreated       = "",
        eventRecordID     = "",
        event_data        = "",
        user_data         = "",
        // message        = "",
        task              = "",
        taskText          = "",
        // execution      = "",
        // security       = "",
        processId         = "",
        threadId          = "",
        processName       = "",
        userId            = "",
        userName          = "",
        version           = "",
        opCode            = "",
    }
  }



            // drop all alloy messages from event_log because it is to noisy. parsing error messages and so on.
  stage.drop {
      source = "source"
      value  = "Alloy"
      drop_counter_reason = "windows_eventlog_alloy"
  }


// to parse the original "message" field
  stage.eventlogmessage {
      source = "message"
      overwrite_existing = true
  }


            // only message field as output. rest is in structured_metadata
  stage.output {
      source = "message"
  }


// to parse the timestamp correctly.
  stage.timestamp {
      source      = "timeCreated"
      format      = "2006-01-02T15:04:05.0000000Z"
//      location    = "Europe/Berlin"                 // DO NOT SET if there is any time zone in the timestamp itself or it will not process any logs at all anymore.
  }


forward_to = [loki.relabel.hostname.receiver]

}


loki.relabel "hostname" {

            // use the hostname as "instance" because "instance" is used in prometheus metrics and so hostnames have equal labels
  rule {
    action        = "replace"
    replacement   = constants.hostname
    target_label  = "instance"
  }

            // allo hostnames to lowercase
  rule {
    action        = "lowercase"
    source_labels = ["instance"]
    target_label  = "instance"
  }

            // only hostname, no domainname
  rule {
    action        = "replace"
    source_labels = ["instance"]
    regex         = "^([^.]+)\\..*$"
    replacement   = "$1"
    target_label  = "instance"
  }

            // label to identify if this is a windows client (enduser) or a windows server (datacenter)
  rule {
    action        = "replace"
    replacement   = "server"
    target_label  = "system_type"
  }

            // if previous stages - no matter if windows eventlog or other logs do not have a "level" we set one as "unknown" what matches the loki explore naming scheme
  rule {
    action        = "replace"
    source_labels = ["level"]
    regex         = "^$"
    replacement   = "unknown"
    target_label  = "level"
  }

            // we add a service_name to match Explore Logs app
  rule {
    action        = "replace"
    source_labels = ["service_name"]
    regex         = "^$"
    replacement   = "unknown"
    target_label  = "service_name"
  }


  forward_to = [loki.write.loki.receiver]
}

@wildum wildum merged commit 8a589ed into main Feb 4, 2025
32 checks passed
@wildum wildum deleted the add-windowsevent-stage-loki-process branch February 4, 2025 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve eventlogmessage stage in loki.process
4 participants