Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IACA analysis incompatible with clang (3.8 and 4.0) #58

Open
rrzeschorscherl opened this issue Nov 27, 2017 · 5 comments
Open

IACA analysis incompatible with clang (3.8 and 4.0) #58

rrzeschorscherl opened this issue Nov 27, 2017 · 5 comments

Comments

@rrzeschorscherl
Copy link
Contributor

rrzeschorscherl commented Nov 27, 2017

I have tried kerncraft (current checkout, 0.5.7) with the himeno.c code and clang:

(python) gh@einstein:~/programming/python/kerncraft/examples$ kerncraft --machine machine-files/BroadwellEP_E5-2697_CoD.yml --pmodel ECM -D M 50 -D N 50 -D L 500 --cache-predictor LC --compiler clang-4.0 kernels/himeno.c
[...]
IACA analysis failed: pointer_increment could not be detected automatically

This happens with clang 3.8 and 4.0.

@rrzeschorscherl
Copy link
Contributor Author

Update: This seems to occur when the index increment is exactly 1, which happens when the compiler does neither vectorize nor otherwise unroll the loop:

        vmovss  %xmm0, (%rsi,%rdx,4)
        incq    %rdx
        cmpl    %edx, %ebx
        jne     .LBB0_127

In this case the index register (here rdx) is not increased by an addq instruction but by a simple incq. It's not exclusive to clang; if I prevent vectorization with the Intel compiler, the same error occurs.

@rrzeschorscherl
Copy link
Contributor Author

Still does not work reliably in 0.6.0:

Executing (compile):  clang-5.0 -Ofast -mavx -D_POSIX_C_SOURCE=200112L -std=c99 himeno.c_compilable.c -S -I/home/gh/programming/python/lib/python3.6/site-packages/kerncraft/headers/
IACA analysis failed: pointer_increment could not be detected automatically. Use --pointer-increment to set manually to byte offset of store pointer address between consecutive assembly block iterations

The loop mechanics looks like this:

        incq    %r10
        cmpq    %r10, %r14
        jne     .LBB0_126

@cod3monk
Copy link
Member

cod3monk commented Dec 11, 2017 via email

@rrzeschorscherl
Copy link
Contributor Author

Here we are:

himeno.c_compilable.txt

Had to rename it - github does not allow .s files as attchments :-/
This was generated with clang 4.0 and the -O3 -mavx options. The loop mechanics is different from the one above, though (which was done with clang 5.0).

@cod3monk
Copy link
Member

cod3monk commented Jan 9, 2018

Difficult one. There are two stores in the loop, one of which goes onto the stack. That confuses the increment detector, because its offset does not change from one iteration to the other and therefore the loop increment would be 0. One workaround would be to ignore anything related to the stack pointer register, but who knows if another compiler will decide to make use of it in another way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants