Add Linux process cgroup attribute #1364

rogercoll · 2024-08-23T18:39:02Z

Fixes # #1357

Changes

Please provide a brief description of the changes here.

Note: if the PR is touching an area that is not listed in the existing areas, or the area does not have sufficient domain experts coverage, the PR might be tagged as experts needed and move slowly until experts are identified.

Merge requirement checklist

CONTRIBUTING.md guidelines followed.
Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with [chore]
schema-next.yaml updated with changes to existing conventions.

florianl · 2024-08-26T06:30:57Z

model/registry/linux.yaml

+    type: attribute_group
+    brief: "Describes Linux Process attributes"
+    attributes:
+      - id: linux.process.cgroup


When thinking about this attribute, two questions come to my mind:

Is the idea to connect this attribute in some way with container.id?

How should cgroupv1 vs cgroupv2 represented with this attribute?

Is the idea to connect this attribute in some way with container.id?

This would be an additional feature that could be build over this attribute, but I would say it should not be strictly connected as cgroups can be used outside containerization environments too (e.g systemd).

But I agree that this attribute would be very useful to extract container and k8s attributes without additional resource detection. This is a collector's transformation example build by @ChrsMark to extract them:

transform/cgroup: error_mode: ignore metric_statements: - context: metric conditions: - resource.attributes["process.cgroup"] != nil statements: - merge_maps(cache,ExtractPatterns(resource.attributes["process.cgroup"],"/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-besteffort.slice/kubelet-kubepods-besteffort-pod(?P<pod_uid>.*).slice/cri-containerd-(?P<container_id>.*).scope$"), "upsert")

How should cgroupv1 vs cgroupv2 represented with this attribute?

The initial idea is to just provide the output of whatever is in /proc/PID/cgroup (does not differentiate between v1 and v2), as the hostmetrics receiver is doing at the moment: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/f2cd3587a6d6d76e0f3515295c6bde3b38ac3eb2/receiver/hostmetricsreceiver/internal/scraper/processscraper/process.go#L125
Additional processing over this attribute should be perform to extract specific values. Do you think it would be interesting unwraping the /proc/PID/cgroup file into more fine-grained attributes (e.g linux.process.cgroup.memory.slice.name )?

Do you think it would be interesting unwraping the /proc/PID/cgroup file into more fine-grained attributes?

The differences between v1 and v2 are significant.
Here is a v1 example for /proc/<PID>/cgroup:

11:pids:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e 10:freezer:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e 9:cpuset:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e 8:devices:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e 7:blkio:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e 6:perf_event:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e 5:net_cls,net_prio:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e 4:memory:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e 3:hugetlb:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e 2:cpu,cpuacct:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e 1:name=systemd:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e

While in this example, all parts, pids, blkio & others, are within the same scope, this assumption is not guaranteed every time.

Compared with a v2 example for /proc/<PID>/cgroup:

0::/system.slice/docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope

With the linked processor the following will be returned:

cgroup result from processor

v1 11:pids:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e

v2 0::/system.slice/docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope

Doesn't the linked processor provide all the file's content? I think it just trims the latest new line.

The differences between v1 and v2 are significant.

Maybe it would not be very useful without further processing, but the linux.process.cgroup would not make any differentiation, it would just provide the corresponding cgroup's file content. To extend the cgroup attributes we could try to define some additional common fields between versions:

linux.process.cgroup.path:

v1: /docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e

v2: /system.slice/docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope

linux.process.cgroup.scope:

v1: ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e

v2: docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope

linux.process.cgroup.version: enum -> v1, v2

linux.process.cgroup.slice: Most inner subslice?

We discussed this topic in today's @open-telemetry/semconv-system-approvers SIG, and we agreed that this attribute is quite generic and might not give additional detail of which cgroup version is being used. Nonetheless, we still think it might be useful to provide the raw content of the cgroup file using this attribute. A similar example in the semantic convention repository, are the process.cmd_line or the os.description attributes (they are set to the content of /proc/<PID/cmdline and /etc/os-release without further processing).

The idea would be to mark it as opt-in and continue adding more fine-grained cgroup related attributes (e.g linux.process.cgroup.version or linux.process.cgroup.scope). What do you think about the approach @florianl ?

Doesn't the linked processor provide all the file's content? I think it just trims the latest new line.

Ah - you are right! I mixed TrimSuffix with Split 🙈

[..] we agreed that this attribute is quite generic and might not give additional detail [..]

My concern is, that keeping the attribute as generic as possible makes it harder to implement and process on a receiving side. Consequently, more fine grade cgroup related attributes should be discussed and proposed first, before a generic one is introduced. As back- and forward compatibility is important to SemConv, I think, just introducing generic attributes might lead to conflicts.

As back- and forward compatibility is important to SemConv, I think, just introducing generic attributes might lead to conflicts.

I think the metrics covered by system semantic conventions are a bit of a special case in semconv. This cgroup attribute for example is a direct reflection of cat /proc/PID/cgroup, and there are numerous other examples of us providing metrics/attributes that directly map to what procfs/the kernel/system APIs would provide. We want to make sure that is covered in semconv, as users often want metrics/attributes that are exact matches to what they would consider looking at when manually investigating a system. So I think this attribute and other specific attributes like mentioned above can coexist, and we can even say we recommend the specific attributes but still ensure we have guidance in place for people who want to instrument a more direct mapping of what the system provides.

(I don't have any direct arguments about the usefulness of this specific attribute, I don't have direct experience using it, but it does match with the general way we try and define system semconv. It is also what currently exists in hostmetricsreceiver in the Collector, and there are users of it already)

@braydonk @florianl @rogercoll Is there a consensus here? If so, let's resolve it so that we can merge this PR.

github-actions · 2024-09-15T03:21:11Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

model/registry/linux.yaml

github-actions · 2024-10-08T03:21:46Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

rogercoll · 2024-10-29T09:36:41Z

@open-telemetry/semconv-system-approvers Could you take a look at this PR when you have a moment? Thanks!

model/linux/registry.yaml

jsuereth · 2024-11-12T16:15:44Z

Would you mind taking a look at the changelog error and fixing that?

joaopgrassi · 2024-11-13T13:29:14Z

Would you mind taking a look at the changelog error and fixing that?

This was a problem on main, updating the PR has now fixed it 👍

model/linux/registry.yaml

braydonk · 2024-11-21T16:25:56Z

model/linux/registry.yaml

+    type: attribute_group
+    brief: "Describes Linux Process attributes"
+    attributes:
+      - id: linux.process.cgroup


I forget if we decided on this being process.linux.cgroup or linux.process.cgroup. I thought we were planning to do the former to keep the process namespace consistent even for OS-exclusive naming.

Aah good catch 🤔 should we apply the same concept as system.{os} metrics/attributes but for process.{os}? I am wondering if it would be similar to the current linux.memory.slab.state attribute or the dotnet.process.cpu.count metric. I will give it a though.

My instinct is that we should. I'm not sure if @open-telemetry/specs-semconv-maintainers and @open-telemetry/semconv-system-approvers generally agree with my thought process here, but since our metrics are very closely related to the resource they are intended to belong to, this is the way I'm thinking of it:

root namespace = the overall category the name belongs to

end namespace = what this name is about (with the exception of .count, in which case I mean the second to last namespace + .count)

middle namespaces = either representing subcategories, or informational

In this case, the cgroup logically belongs to the process it is being reported for, and not to the linux concept. The linux in the name is meant to be informational, rather than a distinct category. So I think it's good for consistency's sake to still keep the process as the root namespace.

For the two examples you provided called out:

dotnet = belongs to a dotnet runtime, process = it is information about the process, cpu.count = the subject of the name. I think the formation of this name is okay.

The memory namespace is anything related to memory that may be shared across multiple namespaces. Similar to the attribute this PR is about, the important category here is memory, and linux is meant to be informational that this is a Linux exclusive attribute. So I think this would also be renamed to memory.linux.slab.state.

The potential counterpoint here for both linux.memory.slab.state and linux.process.cgroup is that the linux name in the middle of the namespace is less ergonomic and clear than as the root namespace. I would agree with that counterpoint, but I still think the order of namespaces relating to the general category hierarchy might trump that.

Looking for opinions!

I'm with @braydonk here on this. I think it's more important to associate cgroup with process than with linux.

I'm in for the max consistency we can achieve. I also like the suggestion that each sub-namespace should narrow down the area.

In the case of runtimes, we use runtime name dotnet, java, etc as the root namespace. The process there is kind of optional, but nice to have to be precise. I'd expect everything about this runtime to belong under that namespace - CPU time or runtime-specific metric/property.

We also can't change the .NET metric name - it's part of the stable runtime as of the last week. https://github.com/dotnet/runtime/blob/v9.0.0/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/RuntimeMetrics.cs#L152 - note to myself to mark conventions as stable now.

In the case of cgroup, it's a property of the process. Between linux and process it's hard to say which one qualifies which, but when you're looking for everything about the process (metrics and attributes), you'd start at the process.

If we needed to describe some OS-wide property, we'd do os.linux.this_property.

So it feels like that consistency here is not in the process or system being the root, but having the area something belongs to being the first. And it leads to dotnet.process.cpu.time or jvm.system.cpu.load_1m along with process.linux.cgroup.

rogercoll requested review from a team August 23, 2024 18:39

github-actions bot assigned jsuereth Aug 23, 2024

florianl reviewed Aug 26, 2024

View reviewed changes

github-actions bot added Stale and removed Stale labels Sep 15, 2024

lmolkova reviewed Sep 22, 2024

View reviewed changes

model/registry/linux.yaml Outdated Show resolved Hide resolved

model/registry/linux.yaml Outdated Show resolved Hide resolved

github-actions bot added the Stale label Oct 8, 2024

rogercoll added 2 commits October 9, 2024 17:42

feat: add linux.process.cgroup attribute

c46b681

docs: add changelog entry

67c412f

rogercoll force-pushed the add_process_cgroup branch from 4607982 to 67c412f Compare October 9, 2024 15:42

rogercoll requested review from a team as code owners October 9, 2024 15:42

github-actions bot removed the Stale label Oct 10, 2024

Merge branch 'main' into add_process_cgroup

ec70436

rogercoll requested a review from a team as a code owner October 11, 2024 12:47

rogercoll force-pushed the add_process_cgroup branch from 4a0cc30 to ec70436 Compare October 11, 2024 12:53

rogercoll added 4 commits October 15, 2024 08:01

Merge branch 'main' into add_process_cgroup

8916b38

Merge branch 'main' into add_process_cgroup

365c932

fix: attribute table generation

b4a055a

Merge branch 'main' into add_process_cgroup

d150914

mx-psi reviewed Oct 29, 2024

View reviewed changes

model/linux/registry.yaml Show resolved Hide resolved

rogercoll added 2 commits October 29, 2024 14:51

docs: add specific proc cgroup file

c75fa54

fix: yaml max line length

46f8509

mx-psi approved these changes Oct 29, 2024

View reviewed changes

Merge branch 'main' into add_process_cgroup

7c68d02

jsuereth approved these changes Nov 12, 2024

View reviewed changes

Merge branch 'main' into add_process_cgroup

318b673

rogercoll added 2 commits November 13, 2024 16:49

Merge branch 'main' into add_process_cgroup

ed5861b

Merge branch 'main' into add_process_cgroup

ea4c8af

lmolkova reviewed Nov 21, 2024

View reviewed changes

model/linux/registry.yaml Show resolved Hide resolved

lmolkova approved these changes Nov 21, 2024

View reviewed changes

add reference to process resource registry

cadd06f

braydonk reviewed Nov 21, 2024

View reviewed changes

lmolkova mentioned this pull request Nov 21, 2024

Mark .NET runtime metrics as stable #1602

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Linux process cgroup attribute #1364

Add Linux process cgroup attribute #1364

rogercoll commented Aug 23, 2024

florianl Aug 26, 2024

rogercoll Aug 26, 2024

florianl Aug 26, 2024

rogercoll Aug 26, 2024

rogercoll Aug 29, 2024

florianl Aug 30, 2024

braydonk Aug 30, 2024

lmolkova Nov 21, 2024 •

edited

Loading

github-actions bot commented Sep 15, 2024

github-actions bot commented Oct 8, 2024

rogercoll commented Oct 29, 2024

jsuereth commented Nov 12, 2024

joaopgrassi commented Nov 13, 2024

braydonk Nov 21, 2024

rogercoll Nov 21, 2024

braydonk Nov 21, 2024 •

edited

Loading

jsuereth Nov 21, 2024

lmolkova Nov 21, 2024 •

edited

Loading

cgroup	result from processor
v1	`11:pids:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e`
v2	`0::/system.slice/docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope`

Add Linux process cgroup attribute #1364

Are you sure you want to change the base?

Add Linux process cgroup attribute #1364

Conversation

rogercoll commented Aug 23, 2024

Changes

Merge requirement checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Sep 15, 2024

github-actions bot commented Oct 8, 2024

rogercoll commented Oct 29, 2024

jsuereth commented Nov 12, 2024

joaopgrassi commented Nov 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

braydonk Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

lmolkova Nov 21, 2024 •

edited

Loading

braydonk Nov 21, 2024 •

edited

Loading

lmolkova Nov 21, 2024 •

edited

Loading