Performance issue with mindspore.ops.normal #192

fr30 · 2022-08-05T08:18:45Z

Environment

Hardware Environment(`Ascend`/`GPU`/`CPU`):

/device gpu

Software Environment:

MindSpore version (source or binary): 1.8.0
Python version (e.g., Python 3.7.5): 3.7.10
OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
GCC/Compiler version (if compiled from source): 7.5.0

Describe the current behavior

mindspore.ops.normal and mindspore.ops.StandardNormal have terrible performance. Generating a single random tensor of size 100x100x100 takes around 6 seconds, which is unacceptable. Also the problem probably occurs for different random ops.

Describe the expected behavior

Random ops should be much faster.

Steps to reproduce the issue\

Simply run the code:

from mindspore import Tensor, dtype
from typing import List
from mindspore.ops import normal
import time
import mindspore

mindspore.set_seed(2137)

def run_random_calculation(iters: List[int], shapes: List[int], prnt = False):
    assert len(iters) == len(shapes)

    mean = Tensor(0.0, dtype.float32)
    std = Tensor(1.0, dtype.float32)


    for i in range(len(iters)):
        iter_no = iters[i]
        shape = shapes[i]

        for j in range(iter_no):
            x = normal(shape, mean, std)

            if(prnt):
                print(x[:2][:2][:1])
warmup_iters = [
    1,
    1,
    0
]
benchmark_iters = [
    0,
    1,
    0
]
shapes = [
    (10, 10, 10),
    (100, 100, 100),
    (500, 500, 100)
]

run_random_calculation(warmup_iters, shapes)

start = time.time()

run_random_calculation(benchmark_iters, shapes)

end = time.time()

print(f'Result \nshapes: {shapes}\niters: {benchmark_iters}\ntime {end - start}')

Related log / screenshot

Result
shapes: [(10, 10, 10), (100, 100, 100), (500, 500, 100)]
iters: [0, 1, 0]
time 5.9976

Special notes for this issue

The problem lays in the file mindspore/ccsrc/plugin/device/gpu/kernel/cuda_impl/cuda_ops/random_op_impl.cu. Kernels for random generation run curand_init() for each iteration, which is expensive operation. Instead they could exploit the fact that curand_normal() changes the state passed as argument.
The problem is described in https://docs.nvidia.com/cuda/curand/device-api-overview.html#performance-notes also with a snippet that helps solving it.

The text was updated successfully, but these errors were encountered:

fr30 linked a pull request Aug 5, 2022 that will close this issue

'Fixed performance issue with random ops' #193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue with mindspore.ops.normal #192

Performance issue with mindspore.ops.normal #192

fr30 commented Aug 5, 2022 •

edited

Loading

Performance issue with mindspore.ops.normal #192

Performance issue with mindspore.ops.normal #192

Comments

fr30 commented Aug 5, 2022 • edited Loading

Environment

Hardware Environment(Ascend/GPU/CPU):

Software Environment:

Describe the current behavior

Describe the expected behavior

Steps to reproduce the issue\

Related log / screenshot

Special notes for this issue

fr30 commented Aug 5, 2022 •

edited

Loading

Hardware Environment(`Ascend`/`GPU`/`CPU`):