-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
A note for the community
Problem
The memory allocated under vector_component_allocated_bytes is showing negative values for some metrics.
This does not happen immediately, but I've been test running Vector on some servers for a couple months now and I've noticed this across many of them.
Example:
This is a problem because without accurate memory metrics, it's difficult to get a baseline for how much memory Vector needs to perform well in my environment, and it's unclear whether it's safe to run alongside other apps with high memory requirements and do capacity planning and that kind of thing. There are other ways to measure memory, but since this is supposed to be a Vector feature, it would be great if it worked.
This is Vector 0.48.0 on Almalinux 9 installed via the publicly available RPM. I will try on latest, but it may be a month before I can confirm if the issue still exists. Would be great meanwhile if you could check if there's some kind of silly counter bug to fix.
Configuration
acknowledgements:
enabled: true
api:
enabled: true
sources:
maven_logs:
type: file
fingerprint:
strategy: device_and_inode
multiline:
# Start collecting a multiline when the line starts with a date
# and stop when you see another line starting with a date
start_pattern: '^\d{4}-\d{2}-\d{2}'
mode: "halt_before"
condition_pattern: '^\d{4}-\d{2}-\d{2}'
timeout_ms: 1000
max_line_bytes: 1024000
include:
- "/opt/maven/logs/*.log"
- "/opt/maven/logs/*.log_[0-9]*"
exclude:
- "/opt/maven/logs/*.bz2"
- "/opt/maven/logs/*.gz"
- "/opt/maven/logs/*.zip"
- "/opt/maven/logs/*.tar"
journal:
type: journald
metrics:
type: internal_metrics
scrape_interval_secs: 10
transforms:
process_maven_logs:
type: remap
inputs:
- enrich_maven_logs
source: |
path_tokens, err = split(.file, "/")
file_tokens, err = split(path_tokens[-1], ".")
.app_name = file_tokens[0] || "UNKNOWN"
.app_instance = file_tokens[1] || "UNKNOWN"
process_journal:
type: remap
inputs:
- journal
source: |
.app_name = .source_type # "journald"
enrich_maven_logs:
type: lua
inputs:
- maven_logs
version: "2"
# This will only run the expensive system commands the first time we see a log file, then we'll use the cached results
source: |
function get_file_owner(filepath)
local handle = io.popen("stat -c '%U' '" .. filepath .. "' 2>/dev/null")
local user = handle and handle:read("*a"):gsub("%s+", "") or "unknown"
if handle then handle:close() end
return user
end
function get_process_name(filepath)
local handle = io.popen("fuser '" .. filepath .. "' 2>/dev/null | xargs -r ps -o comm= -p | grep -v vector | head -1")
local result = handle and handle:read("*a"):gsub("%s+", "") or ""
if handle then handle:close() end
return result
end
function init(emit)
file_metadata_cache = {}
end
function process(event, emit)
local filepath = event.log.file
if filepath and not file_metadata_cache[filepath] then
local user = get_file_owner(filepath)
local process_name = get_process_name(filepath)
file_metadata_cache[filepath] = {
process_name = process_name,
owner_user = user
}
end
if filepath and file_metadata_cache[filepath] then
local metadata = file_metadata_cache[filepath]
-- NOTE: setting event.log.<field> in Lua ACTUALLY sets event.<field>
-- so, event.log.process_name becomes .process_name in VRL later
event.log.process_name = metadata.process_name
event.log.owner_user = metadata.owner_user
end
emit(event)
end
hooks:
init: "init"
process: "process"
sinks:
kafka_maven:
type: kafka
inputs:
- process_maven_logs
encoding:
codec: native
compression: zstd
librdkafka_options:
queue.buffering.max.kbytes: "40960" # 40 MB
socket.send.buffer.bytes: "41943040" # 40 MB
bootstrap_servers: "<REDACTED>"
# Vector spends too much CPU on context switching and has high memory usage
# if we have too many topic names
topic: vector.logs.maven.apps
rate_limit_duration_secs: 1
rate_limit_num: 50000
kafka_system:
type: kafka
inputs:
- process_journal
encoding:
codec: native
compression: zstd
librdkafka_options:
queue.buffering.max.kbytes: "40960"
socket.send.buffer.bytes: "41943040"
message.max.bytes: "2048000" # twice the file source max_line_bytes
bootstrap_servers: "<REDACTED>"
topic: "vector.logs.system.{{app_name}}"
rate_limit_duration_secs: 1
rate_limit_num: 50000
prometheus_sink:
type: prometheus_exporter
inputs:
- metrics
address: 0.0.0.0:9598
Version
0.48.0
Debug Output
Due to the nature of the bug not showing up immediately, I don't think a backtrace from me will be helpful
Example Data
I have tons of different log lines, this probably isn't relevant, and if it is, it would be hard for me to pin down which ones are causing an issue
Additional Context
It's running on AlmaLinux 9 as a systemd unit
[Unit]
Description=Vector
Documentation=https://vector.dev
After=network-online.target
Requires=network-online.target
[Service]
Vector must run as root because we will run system commands to infer metadata from log files
User=root
Group=root
ExecStartPre=/usr/bin/vector validate
ExecStart=/usr/bin/vector --watch-config --allocation-tracing
ExecReload=/usr/bin/vector validate --no-environment
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
AmbientCapabilities=CAP_NET_BIND_SERVICE
EnvironmentFile=-/etc/default/vector
StartLimitInterval=10
StartLimitBurst=5
Lower CPU priority to prevent interference but allow using any idle CPU
CPUWeight=20
Nice=10
LimitNOFILE=4096
[Install]
WantedBy=multi-user.target
References
didn't see anyone else talking about this