You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
GCC/Compiler version (if compiled from source): 7.5.0
Describe the current behavior
mindspore.ops.normal and mindspore.ops.StandardNormal have terrible performance. Generating a single random tensor of size 100x100x100 takes around 6 seconds, which is unacceptable. Also the problem probably occurs for different random ops.
Describe the expected behavior
Random ops should be much faster.
Steps to reproduce the issue\
Simply run the code:
from mindspore import Tensor, dtype
from typing import List
from mindspore.ops import normal
import time
import mindspore
mindspore.set_seed(2137)
def run_random_calculation(iters: List[int], shapes: List[int], prnt = False):
assert len(iters) == len(shapes)
mean = Tensor(0.0, dtype.float32)
std = Tensor(1.0, dtype.float32)
for i in range(len(iters)):
iter_no = iters[i]
shape = shapes[i]
for j in range(iter_no):
x = normal(shape, mean, std)
if(prnt):
print(x[:2][:2][:1])
warmup_iters = [
1,
1,
0
]
benchmark_iters = [
0,
1,
0
]
shapes = [
(10, 10, 10),
(100, 100, 100),
(500, 500, 100)
]
run_random_calculation(warmup_iters, shapes)
start = time.time()
run_random_calculation(benchmark_iters, shapes)
end = time.time()
print(f'Result \nshapes: {shapes}\niters: {benchmark_iters}\ntime {end - start}')
Related log / screenshot
Result
shapes: [(10, 10, 10), (100, 100, 100), (500, 500, 100)]
iters: [0, 1, 0]
time 5.9976
Special notes for this issue
The problem lays in the file mindspore/ccsrc/plugin/device/gpu/kernel/cuda_impl/cuda_ops/random_op_impl.cu. Kernels for random generation run curand_init() for each iteration, which is expensive operation. Instead they could exploit the fact that curand_normal() changes the state passed as argument.
The problem is described in https://docs.nvidia.com/cuda/curand/device-api-overview.html#performance-notes also with a snippet that helps solving it.
The text was updated successfully, but these errors were encountered:
fr30
linked a pull request
Aug 5, 2022
that will
close
this issue
Environment
Hardware Environment(
Ascend
/GPU
/CPU
):/device gpu
Software Environment:
Describe the current behavior
mindspore.ops.normal and mindspore.ops.StandardNormal have terrible performance. Generating a single random tensor of size 100x100x100 takes around 6 seconds, which is unacceptable. Also the problem probably occurs for different random ops.
Describe the expected behavior
Random ops should be much faster.
Steps to reproduce the issue\
Simply run the code:
Related log / screenshot
Result
shapes: [(10, 10, 10), (100, 100, 100), (500, 500, 100)]
iters: [0, 1, 0]
time 5.9976
Special notes for this issue
The problem lays in the file mindspore/ccsrc/plugin/device/gpu/kernel/cuda_impl/cuda_ops/random_op_impl.cu. Kernels for random generation run curand_init() for each iteration, which is expensive operation. Instead they could exploit the fact that curand_normal() changes the state passed as argument.
The problem is described in https://docs.nvidia.com/cuda/curand/device-api-overview.html#performance-notes also with a snippet that helps solving it.
The text was updated successfully, but these errors were encountered: