-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Linux process cgroup attribute #1364
base: main
Are you sure you want to change the base?
Conversation
model/registry/linux.yaml
Outdated
type: attribute_group | ||
brief: "Describes Linux Process attributes" | ||
attributes: | ||
- id: linux.process.cgroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When thinking about this attribute, two questions come to my mind:
- Is the idea to connect this attribute in some way with
container.id
? - How should cgroupv1 vs cgroupv2 represented with this attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea to connect this attribute in some way with container.id?
This would be an additional feature that could be build over this attribute, but I would say it should not be strictly connected as cgroups can be used outside containerization environments too (e.g systemd).
But I agree that this attribute would be very useful to extract container and k8s attributes without additional resource detection. This is a collector's transformation example build by @ChrsMark to extract them:
transform/cgroup:
error_mode: ignore
metric_statements:
- context: metric
conditions:
- resource.attributes["process.cgroup"] != nil
statements:
- merge_maps(cache,ExtractPatterns(resource.attributes["process.cgroup"],"/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-besteffort.slice/kubelet-kubepods-besteffort-pod(?P<pod_uid>.*).slice/cri-containerd-(?P<container_id>.*).scope$"), "upsert")
How should cgroupv1 vs cgroupv2 represented with this attribute?
The initial idea is to just provide the output of whatever is in /proc/PID/cgroup
(does not differentiate between v1 and v2), as the hostmetrics receiver is doing at the moment: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/f2cd3587a6d6d76e0f3515295c6bde3b38ac3eb2/receiver/hostmetricsreceiver/internal/scraper/processscraper/process.go#L125
Additional processing over this attribute should be perform to extract specific values. Do you think it would be interesting unwraping the /proc/PID/cgroup
file into more fine-grained attributes (e.g linux.process.cgroup.memory.slice.name
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be interesting unwraping the /proc/PID/cgroup file into more fine-grained attributes?
The differences between v1 and v2 are significant.
Here is a v1 example for /proc/<PID>/cgroup
:
11:pids:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
10:freezer:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
9:cpuset:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
8:devices:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
7:blkio:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
6:perf_event:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
5:net_cls,net_prio:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
4:memory:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
3:hugetlb:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
2:cpu,cpuacct:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
1:name=systemd:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
While in this example, all parts, pids
, blkio
& others, are within the same scope, this assumption is not guaranteed every time.
Compared with a v2 example for /proc/<PID>/cgroup
:
0::/system.slice/docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope
With the linked processor the following will be returned:
cgroup | result from processor |
---|---|
v1 | 11:pids:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e |
v2 | 0::/system.slice/docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the linked processor provide all the file's content? I think it just trims the latest new line.
The differences between v1 and v2 are significant.
Maybe it would not be very useful without further processing, but the linux.process.cgroup
would not make any differentiation, it would just provide the corresponding cgroup's file content. To extend the cgroup attributes we could try to define some additional common fields between versions:
- linux.process.cgroup.path:
- v1:
/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
- v2:
/system.slice/docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope
- v1:
- linux.process.cgroup.scope:
- v1:
ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
- v2:
docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope
- v1:
- linux.process.cgroup.version: enum -> v1, v2
- linux.process.cgroup.slice: Most inner subslice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this topic in today's @open-telemetry/semconv-system-approvers SIG, and we agreed that this attribute is quite generic and might not give additional detail of which cgroup version is being used. Nonetheless, we still think it might be useful to provide the raw content of the cgroup file using this attribute. A similar example in the semantic convention repository, are the process.cmd_line
or the os.description
attributes (they are set to the content of /proc/<PID/cmdline
and /etc/os-release
without further processing).
The idea would be to mark it as opt-in and continue adding more fine-grained cgroup related attributes (e.g linux.process.cgroup.version
or linux.process.cgroup.scope
). What do you think about the approach @florianl ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the linked processor provide all the file's content? I think it just trims the latest new line.
Ah - you are right! I mixed TrimSuffix
with Split
🙈
[..] we agreed that this attribute is quite generic and might not give additional detail [..]
My concern is, that keeping the attribute as generic as possible makes it harder to implement and process on a receiving side. Consequently, more fine grade cgroup related attributes should be discussed and proposed first, before a generic one is introduced. As back- and forward compatibility is important to SemConv, I think, just introducing generic attributes might lead to conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As back- and forward compatibility is important to SemConv, I think, just introducing generic attributes might lead to conflicts.
I think the metrics covered by system semantic conventions are a bit of a special case in semconv. This cgroup
attribute for example is a direct reflection of cat /proc/PID/cgroup
, and there are numerous other examples of us providing metrics/attributes that directly map to what procfs/the kernel/system APIs would provide. We want to make sure that is covered in semconv, as users often want metrics/attributes that are exact matches to what they would consider looking at when manually investigating a system. So I think this attribute and other specific attributes like mentioned above can coexist, and we can even say we recommend the specific attributes but still ensure we have guidance in place for people who want to instrument a more direct mapping of what the system provides.
(I don't have any direct arguments about the usefulness of this specific attribute, I don't have direct experience using it, but it does match with the general way we try and define system semconv. It is also what currently exists in hostmetricsreceiver in the Collector, and there are users of it already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@braydonk @florianl @rogercoll Is there a consensus here? If so, let's resolve it so that we can merge this PR.
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
4607982
to
67c412f
Compare
4a0cc30
to
ec70436
Compare
@open-telemetry/semconv-system-approvers Could you take a look at this PR when you have a moment? Thanks! |
Would you mind taking a look at the changelog error and fixing that? |
This was a problem on main, updating the PR has now fixed it 👍 |
type: attribute_group | ||
brief: "Describes Linux Process attributes" | ||
attributes: | ||
- id: linux.process.cgroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget if we decided on this being process.linux.cgroup
or linux.process.cgroup
. I thought we were planning to do the former to keep the process
namespace consistent even for OS-exclusive naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah good catch 🤔 should we apply the same concept as system.{os} metrics/attributes but for process.{os}? I am wondering if it would be similar to the current linux.memory.slab.state attribute or the dotnet.process.cpu.count metric. I will give it a though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My instinct is that we should. I'm not sure if @open-telemetry/specs-semconv-maintainers and @open-telemetry/semconv-system-approvers generally agree with my thought process here, but since our metrics are very closely related to the resource they are intended to belong to, this is the way I'm thinking of it:
- root namespace = the overall category the name belongs to
- end namespace = what this name is about (with the exception of
.count
, in which case I mean the second to last namespace +.count
) - middle namespaces = either representing subcategories, or informational
In this case, the cgroup
logically belongs to the process
it is being reported for, and not to the linux
concept. The linux
in the name is meant to be informational, rather than a distinct category. So I think it's good for consistency's sake to still keep the process
as the root namespace.
For the two examples you provided called out:
dotnet
= belongs to a dotnet runtime,process
= it is information about the process,cpu.count
= the subject of the name. I think the formation of this name is okay.- The
memory
namespace is anything related tomemory
that may be shared across multiple namespaces. Similar to the attribute this PR is about, the important category here ismemory
, andlinux
is meant to be informational that this is a Linux exclusive attribute. So I think this would also be renamed tomemory.linux.slab.state
.
The potential counterpoint here for both linux.memory.slab.state
and linux.process.cgroup
is that the linux
name in the middle of the namespace is less ergonomic and clear than as the root namespace. I would agree with that counterpoint, but I still think the order of namespaces relating to the general category hierarchy might trump that.
Looking for opinions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with @braydonk here on this. I think it's more important to associate cgroup with process
than with linux
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in for the max consistency we can achieve. I also like the suggestion that each sub-namespace should narrow down the area.
In the case of runtimes, we use runtime name dotnet
, java
, etc as the root namespace. The process
there is kind of optional, but nice to have to be precise. I'd expect everything about this runtime to belong under that namespace - CPU time or runtime-specific metric/property.
We also can't change the .NET metric name - it's part of the stable runtime as of the last week. https://github.com/dotnet/runtime/blob/v9.0.0/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/RuntimeMetrics.cs#L152 - note to myself to mark conventions as stable now.
In the case of cgroup
, it's a property of the process. Between linux
and process
it's hard to say which one qualifies which, but when you're looking for everything about the process (metrics and attributes), you'd start at the process
.
If we needed to describe some OS-wide property, we'd do os.linux.this_property
.
So it feels like that consistency here is not in the process
or system
being the root, but having the area something belongs to being the first. And it leads to dotnet.process.cpu.time
or jvm.system.cpu.load_1m
along with process.linux.cgroup
.
Fixes # #1357
Changes
Please provide a brief description of the changes here.
Note: if the PR is touching an area that is not listed in the existing areas, or the area does not have sufficient domain experts coverage, the PR might be tagged as experts needed and move slowly until experts are identified.
Merge requirement checklist
[chore]