Skip to content

treydock/cgroup_exporter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

db27169 · Oct 14, 2024

History

93 Commits
May 18, 2024
Jul 19, 2024
Jun 24, 2024
Feb 12, 2020
May 18, 2024
Mar 8, 2022
Oct 2, 2020
May 18, 2024
May 18, 2024
Oct 14, 2024
May 18, 2024
Feb 12, 2020
May 18, 2024
May 18, 2024
May 17, 2024
Jul 19, 2024
Jul 19, 2024
Mar 18, 2020
Jul 19, 2024
Jul 19, 2024
Jul 19, 2024
Oct 2, 2020

Repository files navigation

cgroup Prometheus exporter

Build Status GitHub release GitHub All Releases codecov

cgroup Prometheus exporter

The cgroup_exporter produces metrics from cgroups.

This exporter by default listens on port 9306 and all metrics are exposed via the /metrics endpoint.

Usage

The --config.paths flag is required and must point to paths of cgroups to monitor. If there is /sys/fs/cgroup/cpuacct/user.slice then the value for --config.paths would be /user.slice.

The path /slurm will work for both cgroupv1 and cgroupv2. For cgroupv2 the /slurm path is turned into /system.slice/slurmstepd.scope.

If Slurm is compiled ot support multiple slurmd instances and you have paths that are /sys/fs/cgroup/system.slice/<nodename>_slurmstepd.scope then you must pass --config.paths=/system.slice/<nodename>_slurmstepd.scope and replace <nodename> with the host's slurmd NodeName.

Docker

Example of running the Docker container

docker run -d -p 9306:9306 -v "/:/host:ro,rslave" treydock/cgroup_exporter --path.cgroup.root=/host/sys/fs/cgroup

Install

Download the latest release

Build from source

To produce the cgroup_exporter binaries:

make build

Or

go get github.com/treydock/cgroup_exporter

Process metrics

If you wish to collect process information for a cgroup pass the --collect.proc flag. If this exporter is not running as root then it's required to set capabilities to ensure the user running this exporter can read everything under procfs:

setcap cap_sys_ptrace=eip /usr/bin/cgroup_exporter

Metrics

Example of metrics exposed by this exporter when looking at /user.slice paths:

cgroup_cpu_system_seconds{cgroup="/user.slice/user-20821.slice"} 1.96
cgroup_cpu_total_seconds{cgroup="/user.slice/user-20821.slice"} 3.817500568
cgroup_cpu_user_seconds{cgroup="/user.slice/user-20821.slice"} 1.61
cgroup_cpus{cgroup="/user.slice/user-20821.slice"} 0
cgroup_cpu_info{cgroup="/user.slice/user-20821.slice",cpus=""} 1
cgroup_info{cgroup="/user.slice/user-20821.slice",uid="20821",username="tdockendorf",jobid=""} 1
cgroup_memory_cache_bytes{cgroup="/user.slice/user-20821.slice"} 2.322432e+06
cgroup_memory_fail_count{cgroup="/user.slice/user-20821.slice"} 0
cgroup_memory_rss_bytes{cgroup="/user.slice/user-20821.slice"} 5.378048e+06
cgroup_memory_total_bytes{cgroup="/user.slice/user-20821.slice"} 6.8719476736e+10
cgroup_memory_used_bytes{cgroup="/user.slice/user-20821.slice"} 6.90176e+06
cgroup_memsw_fail_count{cgroup="/user.slice/user-20821.slice"} 0
cgroup_memsw_total_bytes{cgroup="/user.slice/user-20821.slice"} 9.223371968135295e+18
cgroup_memsw_used_bytes{cgroup="/user.slice/user-20821.slice"} 0

Example of metrics exposed by this exporter when looking at /slurm paths:

cgroup_cpu_system_seconds{cgroup="/slurm/uid_20821/job_12"} 0
cgroup_cpu_total_seconds{cgroup="/slurm/uid_20821/job_12"} 0.007840451
cgroup_cpu_user_seconds{cgroup="/slurm/uid_20821/job_12"} 0
cgroup_cpus{cgroup="/slurm/uid_20821/job_12"} 2
cgroup_cpu_info{cgroup="/slurm/uid_20821/job_12",cpus="0,1"} 1
cgroup_info{cgroup="/slurm/uid_20821/job_12",jobid="12",uid="20821",username="tdockendorf"} 1
cgroup_memory_cache_bytes{cgroup="/slurm/uid_20821/job_12"} 4.096e+03
cgroup_memory_fail_count{cgroup="/slurm/uid_20821/job_12"} 0
cgroup_memory_rss_bytes{cgroup="/slurm/uid_20821/job_12"} 3.11296e+05
cgroup_memory_total_bytes{cgroup="/slurm/uid_20821/job_12"} 2.147483648e+09
cgroup_memory_used_bytes{cgroup="/slurm/uid_20821/job_12"} 315392
cgroup_memsw_fail_count{cgroup="/slurm/uid_20821/job_12"} 0
cgroup_memsw_total_bytes{cgroup="/slurm/uid_20821/job_12"} 2.147483648e+09
cgroup_memsw_used_bytes{cgroup="/slurm/uid_20821/job_12"} 315392

Example of metrics exposed by this exporter when looking at /torque paths:

cgroup_cpu_system_seconds{cgroup="/torque/1182958.batch.example.com"} 26.35
cgroup_cpu_total_seconds{cgroup="/torque/1182958.batch.example.com"} 939.568245515
cgroup_cpu_user_seconds{cgroup="/torque/1182958.batch.example.com"} 915.61
cgroup_cpus{cgroup="/torque/1182958.batch.example.com"} 8
cgroup_cpu_info{cgroup="/torque/1182958.batch.example.com",cpus="0,1,2,3,4,5,6,7,8"} 1
cgroup_info{cgroup="/torque/1182958.batch.example.com",jobid="1182958",uid="",username=""} 1
cgroup_memory_cache_bytes{cgroup="/torque/1182958.batch.example.com"} 1.09678592e+08
cgroup_memory_fail_count{cgroup="/torque/1182958.batch.example.com"} 0
cgroup_memory_rss_bytes{cgroup="/torque/1182958.batch.example.com"} 8.2444320768e+10
cgroup_memory_total_bytes{cgroup="/torque/1182958.batch.example.com"} 1.96755132416e+11
cgroup_memory_used_bytes{cgroup="/torque/1182958.batch.example.com"} 5.3434466304e+10
cgroup_memsw_fail_count{cgroup="/torque/1182958.batch.example.com"} 0
cgroup_memsw_total_bytes{cgroup="/torque/1182958.batch.example.com"} 1.96755132416e+11
cgroup_memsw_used_bytes{cgroup="/torque/1182958.batch.example.com"} 5.3434466304e+10