Skip to content

AWS CloudWatch metric sink not cleaning up buffer files consistently #24655

@johannesfloriangeiger

Description

@johannesfloriangeiger

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Preamble: To reproduce the below I suggest to edit lib/vector-buffers/src/variants/disk_v2/common.rs and set DEFAULT_MAX_DATA_FILE_SIZE to 10 KB (10 * 1024) instead of 128 MB to reduce test times.

Steps to reproduce:

Obtain an AWS account and set up local access

Consider the following vector.toml:

data_dir = "."

[sources.static_metrics]
type = "static_metrics"
metrics = [
  { name = "test_metric_0", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_1", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_2", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_3", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_4", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_5", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_6", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_7", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_8", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_9", kind = "absolute", tags = { }, value.gauge.value = 42 }
]

[sinks.console]
type = "console"
inputs = ["static_metrics"]
encoding.codec = "json"
buffer.type = "disk"
buffer.max_size = 2097184

[sinks.aws_cloudwatch_metrics]
type = "aws_cloudwatch_metrics"
inputs = ["static_metrics"]
default_namespace = "default"
healthcheck.enabled = true
buffer.type = "disk"
buffer.max_size = 2097184

Run vector with ./target/release/vector --config ./vector.toml and observe that the buffer-files (buffer-data-x.dat) in buffer/v2/aws_cloudwatch_metrics rotate every few seconds, i.e. once the buffer is full and flushed the file gets deleted.

Now consider the following, slightly different vector.toml that also includes an aggregated histogram metric :

data_dir = "."

[sources.static_metrics]
type = "static_metrics"
metrics = [
  { name = "test_metric_0", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_1", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_2", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_3", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_4", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_5", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_6", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_7", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_8", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_9", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_10", kind = "absolute", tags = { }, value.aggregated_histogram.buckets = [{ le = 10, count = 1, upper_limit = 42.0 }], value.aggregated_histogram.count = 42, value.aggregated_histogram.sum = 42 },
]

[sinks.console]
type = "console"
inputs = ["static_metrics"]
encoding.codec = "json"
buffer.type = "disk"
buffer.max_size = 2097184

[sinks.aws_cloudwatch_metrics]
type = "aws_cloudwatch_metrics"
inputs = ["static_metrics"]
default_namespace = "default"
healthcheck.enabled = true
buffer.type = "disk"
buffer.max_size = 2097184

and when running Vector you will notice that the buffer files are rotating but not cleaned up anymore.

A workaround is to filter for the Metric types that the AWS CloudWatch metric sink supports, e.g. such as the following vector.toml:

data_dir = "."

[sources.static_metrics]
type = "static_metrics"
metrics = [
  { name = "test_metric_0", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_1", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_2", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_3", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_4", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_5", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_6", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_7", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_8", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_9", kind = "absolute", tags = { }, value.gauge.value = 42 },
  { name = "test_metric_10", kind = "absolute", tags = { }, value.aggregated_histogram.buckets = [{ le = 10, count = 1, upper_limit = 42.0 }], value.aggregated_histogram.count = 42, value.aggregated_histogram.sum = 42 },
]

[sinks.console]
type = "console"
inputs = ["static_metrics"]
encoding.codec = "json"
buffer.type = "disk"
buffer.max_size = 2097184

[transforms.metrics_filter]
inputs = ["static_metrics"]
type = "filter"
condition = '.type == "gauge"'

[sinks.aws_cloudwatch_metrics]
type = "aws_cloudwatch_metrics"
inputs = ["metrics_filter"]
default_namespace = "default"
healthcheck.enabled = true
buffer.type = "disk"
buffer.max_size = 2097184

Configuration


Version

vector 0.54.0 (aarch64-apple-darwin)

Debug Output


Example Data

No response

Additional Context

No response

References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions