Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
219 changes: 114 additions & 105 deletions docs/zh/05-features/04-continuous-profiling/02-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,134 +3,143 @@ title: 配置方法
permalink: /features/continuous-profiling/configuration
---

# eBPF On-CPU Profiling
默认情况下,持续剖析仅对特定进程开启,请参考本文修改采集器组配置,开启/调整持续剖析功能。在企业版中,请前往 `系统管理-采集器-配置` 页面进行采集器组配置的修改。

# Process Matcher

eBPF On-CPU Profiling 是默认开启的,但你需要通过修改 `static_config.ebpf.on-cpu-profile.regex` 来指定需要开启的进程列表。默认情况下仅对进程名以 `deepflow-` 开头的进程开启。Agent 支持的配置参数如下
Agent 使用 `inputs.proc.process_matcher` 配置来匹配进程,开启对应进程的持续剖析功能。默认配置如下

```yaml
static_config:
ebpf:
## Java compliant update latency time
## Default: 600s. Range: [5, 3600]s
## Note:
## When deepflow-agent finds that an unresolved function name appears in the function call stack
## of a Java process, it will trigger the regeneration of the symbol file of the process.
## Because Java utilizes the Just-In-Time (JIT) compilation mechanism, to obtain more symbols for
## Java processes, the regeneration will be deferred for a period of time.
#java-symbol-file-refresh-defer-interval: 600s

## Maximum size limit for Java symbol file.
## Default: 10. Range: [2, 100]
## Note:
## Which means it falls within the interval of 2Mi to 100Mi. If the configuration value is outside
## this range, the default value of 10(10Mi), will be used.
## All Java symbol files are stored in the '/tmp' directory mounted by the deepflow-agent. To prevent
## excessive occupation of host node space due to large Java symbol files, a maximum size limit is set
## for each generated Java symbol file.
#java-symbol-file-max-space-limit: 10

## on-cpu profile configuration
on-cpu-profile:
## eBPF on-cpu Profile Switch
## Default: false
disabled: false

## Sampling frequency
## Default: 99
frequency: 99

## Whether to obtain the value of CPUID and decide whether to participate in aggregation.
## Set to 1:
## Obtain the value of CPUID and will be included in the aggregation of stack trace data.
## Set to 0:
## It will not be included in the aggregation. Any other value is considered invalid,
## the CPU value for stack trace data reporting is a special value (CPU_INVALID:0xfff)
## used to indicate that it is an invalid value.
## Default: 0
cpu: 0

## Sampling process name
## Default: ^deepflow-.*
regex: ^deepflow-.*
inputs:
proc:
process_matcher:
- match_regex: \bjava( +\S+)* +-jar +(\S*/)*([^ /]+\.jar)
match_type: cmdline_with_args
only_in_container: false
rewrite_name: $3
enabled_features: [ebpf.profile.on_cpu, proc.gprocess_info]
- match_regex: \bpython(\S)*( +-\S+)* +(\S*/)*([^ /]+)
match_type: cmdline_with_args
only_in_container: false
rewrite_name: $4
enabled_features: [ebpf.profile.on_cpu, proc.gprocess_info]
- match_regex: ^deepflow-
only_in_container: false
enabled_features: [ebpf.profile.on_cpu, proc.gprocess_info]
- match_regex: .*
enabled_features: [proc.gprocess_info]
```

上述配置的含义如下:

- **disabled**:默认为 False,表示功能开启。
- **frequency**:采样频率,默认 99 约表示 10ms 采样周期。不建议设置为 10 的整数倍,避免和程序运行或调度的时钟同频。
- **cpu**:默认为 0,表示一台主机上采集的数据不区分 CPU,当设置为 1 时数据将按 CPU ID 聚合。
- **regex**:开启 On-CPU Profiling 的进程名正则表达式。
- **java-symbol-file-refresh-default-interval**:Java 符号表的刷新间隔,避免高频刷新
- **java-symbol-file-max-space-limit**:避免 Java 符号表占用过大的 `/tmp` 空间
- **match_regex**: 进程匹配的正则表达式,匹配规则如下:
- 第一条规则匹配 Java 进程,例如 `java -jar app.jar`,并将进程名重写为 jar 包名
- 第二条规则匹配 Python 进程,例如 `python app.py`,并将进程名重写为 Python 脚本名
- 第三条规则匹配以 `deepflow-` 开头的进程
- 最后一条规则匹配所有进程
- **match_type**: 匹配类型,可选值:
- `cmdline_with_args`: 匹配完整命令行(包含参数)
- `cmdline`: 仅匹配命令(不含参数)
- `process_name`: 匹配进程名
- **only_in_container**: 是否仅匹配容器内的进程
- **rewrite_name**: 重写进程名的规则,支持正则表达式捕获组引用
- **enabled_features**: 为匹配的进程启用的功能列表:
- `ebpf.profile.on_cpu`: 开启 On-CPU 剖析,需要配置 `inputs.ebpf.profile.on_cpu.disabled: false`
- `ebpf.profile.off_cpu`: 开启 Off-CPU 剖析,需要配置 `inputs.ebpf.profile.off_cpu.disabled: false`
- `ebpf.profile.memory`: 开启内存剖析,需要配置 `inputs.ebpf.profile.memory.disabled: false`

同时可以使用 `inputs.proc.process_blacklist` 来忽略某些进程,其优先级比 `process_matcher` 高。

# eBPF Off-CPU Profiling
```yaml
inputs:
proc:
process_blacklist: [sleep, sh, bash, pause, runc, grep, awk, sed, curl]
```

# Symbol Table

eBPF Off-CPU Profiling(仅企业版)是默认开启的,但你需要通过修改 `static_config.ebpf.off-cpu-profile.regex` 来指定需要开启的进程列表。默认情况下仅对进程名以 `deepflow-` 开头的进程开启。Agent 支持的配置参数如下:
可以为特定语言配置符号表相关的设置。这些设置对于各类持续剖析都生效,一般保持默认配置即可,无需修改。

```yaml
static_config:
inputs:
ebpf:
symbol_table:
golang_specific:
enabled: false
java:
refresh_defer_duration: 60s
max_symbol_file_size: 10
```

上述配置的含义如下:
- **golang_specific.enabled**:配置是否开启 Golang 特有符号表的解析能力。
- **refresh_defer_duration**: Java 符号表的刷新延迟,避免高频刷新。
- **max_symbol_file_size**: Java 符号表占用的最大空间大小,单位为 GB,避免占用过大的 `/tmp` 空间。

# eBPF On-CPU Profiling

eBPF On-CPU Profiling 是默认开启的,但需要修改 `inputs.proc.process_matcher` 来指定进程列表。Agent 支持的配置参数如下:

## Off-cpu profile configuration, Enterprise Edition Only.
#off-cpu-profile:
## eBPF off-cpu Profile Switch
## Default: false
#disabled: false

## Off-cpu trace process name
## Default: ^deepflow-.*
#regex: ^deepflow-.*

## Whether to obtain the value of CPUID and decide whether to participate in aggregation.
## Set to 1:
## Obtain the value of CPUID and will be included in the aggregation of stack trace data.
## Set to 0:
## It will not be included in the aggregation. Any other value is considered invalid,
## the CPU value for stack trace data reporting is a special value (CPU_INVALID:0xfff)
## used to indicate that it is an invalid value.
## Default: 0
#cpu: 0

## Configure the minimum blocking event time
## Default: 50us. Range: [0, 2^32-1)us
## Note:
## If set to 0, there will be no minimum value limitation.
## Scheduler events are still high-frequency events, as their rate may exceed 1 million events
## per second, so caution should still be exercised.
## If overhead remains an issue, you can configure the 'minblock' tunable parameter here.
## If the off-CPU time is less than the value configured in this item, the data will be discarded.
## If your goal is to trace longer blocking events, increasing this parameter can filter out shorter
## blocking events, further reducing overhead. Additionally, we will not collect events with a block
## time exceeding 1 hour.
#minblock: 50us
```yaml
inputs:
ebpf:
profile:
on_cpu:
disabled: false
sampling_frequency: 99
aggregate_by_cpu: false
```

上述配置的含义如下:
- **disabled**: 默认为 false,表示功能开启。
- **sampling_frequency**: 采样频率,默认 99 约表示 10ms 采样周期。不建议设置为 10 的整数倍,避免和程序运行或调度的时钟同频。
- **aggregate_by_cpu**: 默认为 false,表示一台主机上采集的数据不区分 CPU,当设置为 true 时数据将按 CPU ID 聚合。

# eBPF Off-CPU Profiling

eBPF Off-CPU Profiling(仅企业版)是默认关闭的,同时需要修改 `inputs.proc.process_matcher` 来指定需进程列表。Agent 支持的配置参数如下:

- **disabled**:默认为 False,表示功能开启。
- **regex**:开启 Off-CPU Profiling 的进程名正则表达式。
- **cpu**:默认为 0,表示一台主机上采集的数据不区分 CPU,当设置为 1 时数据将按 CPU ID 聚合。
- **minblock**:使用持续时间限制采集的 Off-CPU 事件,避免采集过多导致主机负载过高。
```yaml
inputs:
ebpf:
profile:
off_cpu:
disabled: true
aggregate_by_cpu: false
min_blocking_time: 50us
```

另外,下面两个 On-CPU 的配置项同时也对 Off-CPU 有效
上述配置的含义如下

- **java-symbol-file-refresh-default-interval**
- **java-symbol-file-max-space-limit**
- **disabled**:默认为 true,表示功能关闭。
- **aggregate_by_cpu**:默认为 false,表示一台主机上采集的数据不区分 CPU,当设置为 true 时数据将按 CPU ID 聚合。
- **min_blocking_time**:使用持续时间限制采集的 Off-CPU 事件,避免采集过多导致主机负载过高。

# eBPF Memory Profiling

eBPF Memory Profiling(仅企业版)是默认关闭的,你需要通过修改 `static_config.ebpf.memory-profile.regex` 来指定需要开启的进程列表。Agent 支持的配置参数如下:
eBPF Memory Profiling(仅企业版)是默认关闭的,同时需要修改 `inputs.proc.process_matcher` 来指定需进程列表。Agent 支持的配置参数如下:

```yaml
static_config:
inputs:
ebpf:
# Memory profile configuration, Enterprise Edition Only.
memory-profile:
# eBPF memory Profile Switch
# Default: true
disabled: true

# Memory trace process name
# Default: ^java
regex: ^java
profile:
memory:
disabled: true
report_interval: 10s
allocated_addresses_lru_len: 131072
sort_length: 16384
sort_interval: 1500ms
queue_size: 32768
```

上述配置的含义如下:

- **disabled**:默认为 true,表示功能关闭。
- **report_interval**:Agent 聚合和上报内存剖析数据的间隔。
- **allocated_addresses_lru_len**:采集器使用 LRU 缓存记录进程分配的地址,以避免内存使用失控。每个 LRU 条目大约占 32B 内存。
- **sort_length**:内存剖析数据处理前按时间戳进行排序的队列长度。
- 配置该选项时先按说明调整 `sort_interval` 参数,在参考采集器性能统计 `deepflow_agent_ebpf_memory_profiler` 中 `dequeued_by_length` 和 `dequeued_by_interval` 指标,在保证前者小于后者几倍的前提下适当调小该参数。
- **sort_interval**:内存剖析数据处理前按时间戳进行排序的最大时间间隔。该参数控制排序数组中第一个和最后一个元素之间的时间间隔的最大值。
- 配置该选项可以参考采集器性能统计 `deepflow_agent_ebpf_memory_profiler` 中 `time_backtracked` 指标,增大该参数使之为 0 即可。注意可能需要相应增大 `sort_length` 参数。
- **queue_size**:内存剖析组件内部的队列大小。
- 配置该选项可以参考采集器性能统计 `deepflow_agent_ebpf_memory_profiler` 中 `overwritten` 和 `pending` 指标,增大该配置使得前者为 0,后者不高于该配置即可。
Loading
Loading