Improve numpy-scipy runtime performance #30

pthomadakis · 2023-09-21T23:06:38Z

In some cases there is a significant difference in runtime performance between mlir-cpu-runner and code executed through the numpy-scipy interface.
This issue also appears when compiling and running the COMET llvm dialect IR through the following steps:

mlir-translate --mlir-to-llvmir to generate the llvmir code
opt --O3 -S to optimize the llvmir code of the previous step
clang/gcc -O3 to generate the executable.
Run the executable.

The text was updated successfully, but these errors were encountered:

…-cpu-runner (#30). By passing the target triple returned by ``llvm-config --host-target`` to llvm opt. Also fixed a bug where calling the kernel function in a loop would cause wrong code generation. Finally, made integration test use multiprocessing

pthomadakis · 2023-09-29T23:01:05Z

Solved issue by passing --mtriple=<cpu-triple> to opt

…alls to a single one at the slight cost of readability (#30). We now need only create 1 process to lower scf->llvm, translate, opt, and generate the library, instead of doing it in separate steps.

… for some CPUs (#30).

…-cpu-runner (#30). By passing the target triple returned by ``llvm-config --host-target`` to llvm opt. Also fixed a bug where calling the kernel function in a loop would cause wrong code generation. Finally, made integration test use multiprocessing

…alls to a single one at the slight cost of readability (#30). We now need only create 1 process to lower scf->llvm, translate, opt, and generate the library, instead of doing it in separate steps.

… for some CPUs (#30).

…-cpu-runner (#30). By passing the target triple returned by ``llvm-config --host-target`` to llvm opt. Also fixed a bug where calling the kernel function in a loop would cause wrong code generation. Finally, made integration test use multiprocessing

…alls to a single one at the slight cost of readability (#30). We now need only create 1 process to lower scf->llvm, translate, opt, and generate the library, instead of doing it in separate steps.

… for some CPUs (#30).

…-cpu-runner (#30). By passing the target triple returned by ``llvm-config --host-target`` to llvm opt. Also fixed a bug where calling the kernel function in a loop would cause wrong code generation. Finally, made integration test use multiprocessing

…alls to a single one at the slight cost of readability (#30). We now need only create 1 process to lower scf->llvm, translate, opt, and generate the library, instead of doing it in separate steps.

… for some CPUs (#30).

pthomadakis self-assigned this Sep 21, 2023

pthomadakis closed this as completed Sep 29, 2023

pthomadakis reopened this Oct 12, 2023

pthomadakis added a commit that referenced this issue Oct 13, 2023

In cometpy, compiling with -march=native further improves performance…

9583804

… for some CPUs (#30).

pthomadakis added a commit that referenced this issue Oct 13, 2023

In cometpy, compiling with -march=native further improves performance…

16a6de6

… for some CPUs (#30).

pthomadakis added a commit that referenced this issue Oct 22, 2023

In cometpy, compiling with -march=native further improves performance…

24c5c71

… for some CPUs (#30).

pthomadakis added a commit that referenced this issue Oct 22, 2023

In cometpy, compiling with -march=native further improves performance…

3bda2a5

… for some CPUs (#30).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve numpy-scipy runtime performance #30

Improve numpy-scipy runtime performance #30

pthomadakis commented Sep 21, 2023

pthomadakis commented Sep 29, 2023

Improve numpy-scipy runtime performance #30

Improve numpy-scipy runtime performance #30

Comments

pthomadakis commented Sep 21, 2023

pthomadakis commented Sep 29, 2023