You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dcgmi nvlink --link-status
+----------------------+
| NvLink Link Status |
+----------------------+
GPUs:
gpuId 0:
D D D D D D D D D D D D D D D D D D
gpuId 1:
D D D D D D D D D D D D D D D D D D
NvSwitches:
No NvSwitches found.
Key: Up=U, Down=D, Disabled=X, Not Supported=_
Yet when i run dcgmi diagnostics tests at level 2, they all pass, shouldnt they fail if the nvlinks aren't working? the dcgmi docs say that the PCIE tests tests both PCIe and nvlink?
dcgmi diag -r 2
Successfully ran diagnostic for group.
+---------------------------+------------------------------------------------+
| Diagnostic | Result |
+===========================+================================================+
|----- Metadata ----------+------------------------------------------------|
| DCGM Version | 3.3.7 |
| Driver Version Detected | 560.35.03 |
| GPU Device IDs Detected | 2321,2321 |
|----- Deployment --------+------------------------------------------------|
| Denylist | Pass |
| NVML Library | Pass |
| CUDA Main Library | Pass |
| Permissions and OS Blocks | Pass |
| Persistence Mode | Pass |
| Info | Persistence mode for GPU 0 is disabled. Enabl |
| | e persistence mode by running "nvidia-smi -i |
| | <gpuId> -pm 1 " as root.,Persistence mode for |
| | GPU 1 is disabled. Enable persistence mode b |
| | y running "nvidia-smi -i <gpuId> -pm 1 " as r |
| | oot. |
| Environment Variables | Pass |
| Page Retirement/Row Remap | Pass |
| Graphics Processes | Pass |
| Inforom | Pass |
+----- Integration -------+------------------------------------------------+
| PCIe | Pass - All |
+----- Hardware ----------+------------------------------------------------+
| GPU Memory | Pass - All |
+----- Stress ------------+------------------------------------------------+
+---------------------------+------------------------------------------------+
The text was updated successfully, but these errors were encountered:
Hello,
I have a machine where all nvlinks are down:
Yet when i run dcgmi diagnostics tests at level 2, they all pass, shouldnt they fail if the nvlinks aren't working? the dcgmi docs say that the PCIE tests tests both PCIe and nvlink?
The text was updated successfully, but these errors were encountered: