-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
When using the new ExecCPUAffinity feature, we hit a problem when containers are using exclusive CPUs (which I believe was a lot of the motivation for using this feature). sched_setaffinity fails with EINVAL
if the cpuset we attempt to set the affinity for is outside of the process's exclusive set. The problem is that if we use the Initial field in the ExecCPUAffinity structure, then that will always be the case, because it's very rare for the oci runtime process to be running pre-cgroup move in the same cgroup as the container. Further, using Final in that structure also doesn't work. After the runc exec process is moved to container's cgroup, it would risk slipping into one of the actually exclusive cpus, rather than the cpu in the set that is reserved for exec proecsses (idiomatically: the First).
The only thing I can think to do is to create a subcgroup of the container's cgroup, with a cpuset that only includes the ExecCPUAffinity.Initial cpus, preliminarily have runc move itself into that cgroup, then set its own affinity, then move itself into the actual container cgroup.
If we do that approach, I think we may want to codify that approach in the runtime spec, or at least replicate in runc. i also recognize the looming rewrite of this feature opencontainers/runtime-spec#1296 but I am pretty sure that suffers the same issue.
side note: I'm not sure if this is the best place to file this, so this is a duplicate of containers/crun#1915. runtime-spec felt like an option but this is more of an implementation detail right now, though eventually may not be...