-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MachinePipeliner] Improve loop carried dependence analysis #94185
[MachinePipeliner] Improve loop carried dependence analysis #94185
Conversation
Here is an example of incorrect scheduling in the current implementation. The increment per iteration of the address is -4, but it is erroneously recognized as 4.
Therefore, loop carried dependence from store to load is ignored.
This results in an incorrect schedule in which the next iteration's load precedes the store.
If |
This modification may prevent some codes to be pipelined that are currently pipelined. Increasing the number of patterns to be analyzed might mitigate it. Hexagon enables MachinePipeliner by default so I would like to hear comments from its developers. |
ping |
Hi @androm3da @quic-akaryaki @quic-areg, we are attempting to fix incorrect scheduling by MachinePipeliner pass. Hexagon has enabled this pass by default and we would appreciate if you could review it. |
86c1c59
to
dfc329e
Compare
@iajbar and @SundeepKushwaha can you review this pull req? |
hi @ytmukai, you said the changes inhibit some loops to be pipelined? This patch fixes a missing/incorrect dependency, so I was curious why this should impact some earlier loops which were being pipelined? |
Hi @quic-santdas, thanks for your comment.
Currently, the incremental value per iteration of the address of a memory access is recognized only by the instruction that defines the base register. The code is as follows. llvm-project/llvm/lib/CodeGen/MachinePipeliner.cpp Lines 2583 to 2599 in 30e5d71
This method does not always give the correct value. Therefore, in order to obtain the correct value, this patch fixes it to recognize only the patterns that make up a cycle, as shown below. llvm-project/llvm/lib/CodeGen/MachinePipeliner.cpp Lines 2593 to 2600 in dfc329e
Because of the simplicity of the analysis, I have a concern that there may not be sufficient patterns to be supported. (Failure to analyze incremental values does not necessarily disable the pipeline, but it could result in poor scheduling.) |
if (Def->getParent() != LoopBB) | ||
return false; | ||
|
||
if (Def->isCopy()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does REG_SEQUENCE also need to be supported to arrive at the definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice explanation @ytmukai!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @quic-santdas!
I checked in llvm-test-suite for Hexagon with the following code to see if REG_SEQUENCE appears, but could not find it.
} else if (Def->isRegSequence()) {
LLVM_DEBUG({
dbgs() << "RegSequence Found\n";
});
Could you tell me if there is a typical pattern in which REG_SEQUENCE appears in the calculation of the inductive variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned REG_SEQUENCE since you are handling COPY instructions. I thought this might need handling too since it is used to combine subregisters. There is no instance/example in the test suite though.
The code looks good from my side. If any corner case gets missed, that can be added later I guess.
ping |
dfc329e
to
517e1bf
Compare
@quic-santdas Sorry for the late reply. Thank you for your review! I counted the number of loops pipelined in llvm-test-suite for Hexagon. The results are as follows.
There are cases where this patch enables pipelining as a result of unnecessary dependencies being removed. The number of loops affected is small, so applying this patch does not seem to cause major problems. |
517e1bf
to
e6ab32b
Compare
ping |
@quic-santdas @androm3da Could you approve this fix? |
Sorry: I can't provide adequate review of this change, I'm not familiar with these elements. Santanu or @iajbar should be able to review this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch looks fine to me.
Hello Yuta Mukai, could you please investigate why the 9 loops are not getting pipelined with your patch? Thank you. |
Hello @iajbar, as a result of the correct detection of dependencies, the order of scheduling etc. may be changed and good schedules may not be found. For example, the following loop cannot be pipelined with this patch. For this loop unrolled, the current code determines that there are dependencies between accesseses of I will look into further cases. |
The previous implementation had false positive/negative cases in analysis of loop carried dependence. A case of missed dependencies is caused by incorrect analysis of address increments. It is fixed by a strict analysis of recursive definitions. See added test swp-carried-dep4.mir. Excessive detection of dependence is corrected by improving the formula for determining overlap of address ranges to be accessed. See added test swp-carried-dep5.mir.
e6ab32b
to
4969a8c
Compare
@iajbar The number of pipelined loops has been recounted to exclude duplicates due to inlining, etc. The optimization flag is -O3.
The loops that cannot be pipelined with this patch are as follows:
For these, I confirmed that there were no problems in detecting the dependences as in the previous post. The following loops are incorrectly pipelined without this patch. The schedule result is as follows:
The store for |
Hi @ytmukai , I am curious to know if there is any performance data available with your patch on any benchmarks? |
@quic-santdas I do not have performance data as I do not have a measurement environment. There are few loops that are no longer pipelined, as shown in the table above, so I consider it more important to fix missed optimizations. |
@iajbar @quic-santdas Are there any other concerns? If not, I will merge this patch. |
LGTM, thanks @ytmukai. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/195/builds/4488 Here is the relevant piece of the build log for the reference
|
The previous implementation had false positive/negative cases in the analysis of the loop carried dependency. A missed dependency case is caused by incorrect analysis of address increments. This is fixed by strict analysis of recursive definitions. See added test swp-carried-dep4.mir. Excessive dependency detection is fixed by improving the formula for determining the overlap of address ranges to be accessed. See added test swp-carried-dep5.mir.
Hi @ytmukai, The same
|
The previous implementation had false positive/negative cases in analysis of loop carried dependence.
A case of missed dependencies is caused by incorrect analysis of address increments. It is fixed by a strict analysis of recursive definitions. See added test swp-carried-dep4.mir.
Excessive detection of dependence is corrected by improving the formula for determining overlap of address ranges to be accessed. See added test swp-carried-dep5.mir.