Skip to content

Commit

Permalink
Save current progress
Browse files Browse the repository at this point in the history
Signed-off-by: Gavin Zhao <[email protected]>
  • Loading branch information
GZGavinZhao committed Jan 14, 2024
1 parent 5e6de23 commit b3fe46d
Showing 1 changed file with 189 additions and 0 deletions.
189 changes: 189 additions & 0 deletions docs/user/software/development/rocm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
---
title: ROCm/HIP
summary: A quick guide to getting set up for ROCm/HIP development on Solus
---

# ROCm/HIP

ROCm is AMD's open-source software stack for GPU computation.

Note that ROCm is not required in order for, say, your display or browser, to
use GPU-acclerated rendering. These are more on the driver side of things and

Check warning on line 11 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (acclerated)
are
handled by the kernel and/or Mesa. ROCm is mainly focused on GPU-accelerated
computing, such as GPU rendering in Blender or GPU-accelerated machine learning
in PyTorch.

## Install ROCm/HIP

```bash
sudo eopkg it rocm-hip rocm-opencl

Check warning on line 20 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (rocm)

Check warning on line 20 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (rocm)

Check warning on line 20 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (opencl)
```

If you are also developing with ROCm/HIP, install the
development files and the `hipcc` compiler driver as well:

Check warning on line 24 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (hipcc)
```bash
sudo eopkg it rocm-hip-devel

Check warning on line 26 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (rocm)
```

## Necessary Environment Variables

It is recommended and safe to put these environment variables in your
`~/.bashrc`:
```bash
export ROCM_PATH=/usr

Check warning on line 34 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (ROCM)
export HIP_PATH=/usr
```

If you're developing with ROCm/HIP, the following environment variables will
save you a lot of hassle:
```bash
export HIP_DEVICE_LIB_PATH=/usr/lib64/amdgcn/bitcode

Check warning on line 41 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (amdgcn)

Check warning on line 41 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (bitcode)
export DEVICE_LIB_PATH=$HIP_DEVICE_LIB_PATH
export HIP_PLATFORM=amd
export HIP_RUNTIME=amd
export HIP_ROCCLR_HOME=$ROCM_PATH

Check warning on line 45 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (ROCCLR)
```

## Supported Hardware and GPU Architectures

<!--
ROCm is designed such that in order to for the compiled binaries to run on a
certain GPU model, during compiling one must compile with that GPU as the
compilation target.
!-->

ROCm is designed such that in order for a library to support N different GPU
architectures, that library must be compiled N times, once for each
architecture, causing the build time of a package to grow linearly. For example,
if we want PyTorch to support running on 5 different GPU architectures, we
essentially need to compile PyTorch 5 times. It should be obvious this quickly
becomes a maintenance burden as the compile time grows linearly with respect to
the number of GPUs models we want to support.

Therefore, we have carefuly picked
the following baseline
architectures such that we support as many reasonably recent
hardware as possible while not causing compilation times to skyrocket. Any GPU
architecture in the list below should work out-of-the-box.

- `gfx803`
- `gfx900`
- `gfx906`
- `gfx908`
- `gfx90a`
- `gfx1010`; for `gfx101*` GPUs such as `gfx1011` and `gfx1012`, see [Emulating
as a Supported Architecture](#emulating-as-a-supported-architecture) section.
- `gfx1030`; for `gfx103*` GPUs such as `gfx1031` and `gfx1032`, see [Emulating
as a Supported Architecture](#emulating-as-a-supported-architecture) section.
- `gfx1010`
- `gfx1011`
- `gfx1012`

:::tip

Run `rocminfo` provided by the `rocminfo` package to
see what architecture your GPU(s) have.

:::

:::note

This list is only the minimum supported architectures. Some packages like
[Blender](#blender) are built with support for even more architectures.

:::

If your GPU model is not on the list, please open an issue in
our [Issue Tracker] with your GPU model and the year that this model is
released.

### Emulating as a Supported Architecture

Several GPU archiectures, such as those in the Navi 1 family, have
almost identical (if not exactly identical) ISA that allows a program compiled for
one architecture to run seamlessly on other.
For example, any program compiled for the `gfx1030` architecture can also run on
the `gfx1031` and `gfx1032` architectures. A list of such architectures is
listed in the previous section.

To emulate your GPU as a supported architecture, the environment variable
`HSA_OVERRIDE_GFX_VERSION` must be specified. Examples:

Emulating as `gfx1030`:
```bash
export HSA_OVERRIDE_GFX_VERSION=10.3.0
```

Emulating as `gfx1010`:
```bash
export HSA_OVERRIDE_GFX_VERSION=10.1.0
```

Emulating as `gfx900`:
```bash
export HSA_OVERRIDE_GFX_VERSION=9.0.0
```

## Specifying which GPU to use

Sometimes, it may be hard or impossible to tell your program to use the GPU
that you want. This not only happnes on a system with multiple GPUs; this can
also happen when your CPU is also made by AMD and has an
integrated GPU. You can check whether your CPU has usable integrated graphics as
well by running `linux-driver-management status`. If your CPU has
integrated graphics and you have turned on switchable/hybrid graphics in your
BIOS, you may see something like the following:
```
Hybrid Graphics
╒ Primary GPU (iGPU)
╞ Device Name : Cezanne [Radeon Vega Series / Radeon Vega Mobile Series]
╞ Manufacturer : Advanced Micro Devices, Inc. [AMD/ATI]
╞ Product ID : 0x1669
╞ Vendor ID : 0x1002
╞ X.Org PCI ID : PCI:7:0:0
╘ Boot VGA : yes
╒ Secondary GPU (dGPU)
╞ Device Name : Navi 23 [Radeon RX 6600/6600 XT/6600M]
╞ Manufacturer : Advanced Micro Devices, Inc. [AMD/ATI]
╞ Product ID : 0x73ab
╞ Vendor ID : 0x1002
╞ X.Org PCI ID : PCI:2:0:0
╘ Boot VGA : no
```

ROCm/HIP offers the environment variable `HIP_VISIBLE_DEVICES` to control which
GPUs are visible to a process from the ROCm/HIP API. Only devices whose index
is present in the sequence are visible to HIP. For example, `export
HIP_VISIBLE_DEVICES=0` makes only the GPU with device index 0 visible, and
`export HIP_VISIBLE_DEVICES=0,2` makes only the GPUs with device indices 0 and 2
visible.

:::caution

The device index is **NOT** its agent number in the output of `rocminfo`! You
can find your device's corresponding index through the output of `rocm-smi`,
provided by the `rocm-smi` package.

:::

:::note

As suggested by its name, `HIP_VISIBLE_DEVICES` only hides the GPU from the
ROCm/HIP side. A program can still access GPUs hidden by `HIP_VISIBLE_DEVICES`
by calling other graphics APIs such as OpenGL.

:::

## Software-Specific Instructions

### Blender

### PyTorch


## Reporting an Issue


0 comments on commit b3fe46d

Please sign in to comment.