Skip to content

multi-tenant cannot detect pid for monitor.script #6851

@yangw-dev

Description

@yangw-dev

inductor-A100-perf-nightly and h100
utilize cgroups in order to isolate performance, they also run as specific users.

This is the main code: https://github.com/pytorch-labs/pytorch-gha-infra/tree/main/multi-tenant

https://github.com/pytorch/pytorch/blob/977abe786d907c1ff76528a550e3d53c9f3b1044/.github/workflows/_linux-test.yml#L177

some h100 and a100 tests seem run with cgourp does not expose themselves to host for pid names and infos,

potentially run the monitor in test.sh

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Cold Storage

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions