-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MachinePipeliner] Add an abstract layer to manipulate Data Dependenc… #109918
[MachinePipeliner] Add an abstract layer to manipulate Data Dependenc… #109918
Conversation
e02f4ab
to
2424835
Compare
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this is a mostly mechanical fix which looks fine, except that it ignores the fact that antidependences in memory and explicit physical registers also create loop-carried dependences without having any phi nodes. The existing algorithm sometimes takes these into account, and sometimes not, which makes it truly confusing. As only the phi-node anti-dependences are turned into data nodes with distance 1 in this path, at least some of the places where the code is marked "To preserve previous behavior and prevent regression" are likely necessary and can't be removed "if this doesn't have significant impact on performance".
I'm particularly concerned about the fact that these antidependences are not reversed before calling createAdjacencyStructure, so the circuits they imply won't be seen at all, and that's going to do ... something, so I think we need to leave in the reversal for now. It might be possible to merge this behavior into createAdjacencyStructure.
I also note that I would expect some performance changes as a result of this change simply because nodes do get added to queues in a different order; previously most queues would process things in "within iteration, then loop-dependent" order.
The existing implementation doesn't express loop-carried anti-dependencies in memory and physical registers directly by anti-edges. For memory, such dependencies are detected by calling
Now
Yes, the iteration order of edges does change. However, I believe that for most functions, their behavior is independent of the order of the edges. I forgot to mention in the description, but I used llvm-test-suite to verify that there is no change in the compiled result before and after this change. Of course, I don't try every combination of compilation options, but I believe that this patch doesn't change the existing behavior. However, as you said, the existing algorithm is confusing, so there is a great possibility of missing or misunderstanding. |
Hi, @dpenry Thank you for your previous comment. On second thought, I am beginning to think that the code marked "To preserve previous behavior and prevent regression" should be left in place. My guess remains the same as before. That is, I think the handling of these anti-dependencies looks unintentional to me. If loop-carried physical register dependencies are to be handled correctly, at least the following must also be considered (but are not now).
However, it is possible that the current implementation just happens to handle loop-carried dependencies correctly and works fine. So, now I agree with you that we should not remove code marked as "To preserve previous behavior and prevent regression". But these are just workarounds and should be fixed to treat other loop-carried dependencies correctly in the future. What do you think? |
Physical register dependency support is not unintentional, just not widely used and somewhat under-tested because software pipelining is enabled for only a handful of architectures and the first ones for which it was used supported decrement-and-branch instructions. When I added support for SMS for ARM and had to support using normal conditional branches using the condition codes, I had to insert some special cases for physical registers because some dependences were not being respected. See the comments in dcb7764. I'm not sure what the godbolt link you've supplied is supposed to show, as there isn't a loop, so you can't say anything about whether loop-carried dependences are handled correctly. The proper output and anti-dependencies for the MIR are present in the data graph. One might argue that an anti-dependence between SU(1) and SU(4) is missing, but the anti-dependence from 1->2 combined with the output dependence from 2->4 make it unnecessary, and I think the DDG builder might notice that. |
Sorry, something was wrong with the my previous godbolt link. The following is the correct one. https://godbolt.org/z/3v5rnPG3o I think that the loop-carried dependence SU(5) -> SU(2) is not handled correctly in the current implementation. If we're just talking about this case, it doesn't matter because instructions that read/write |
That godbolt makes more sense! I do note that in this particular instance, an SU(5)->SU(2) dependence carried across the loop is not seen because SU(7) kills the liveness of SU(5)'s CPSR write in the output direction and SU(1) kills the liveness of whatever entered the basic block in the input direction. Is that a problem in general? I'd have to think more on it. Note that CPSR isn't called out specially in the code; all physical registers are treated alike. Are all instructions annotated properly with implicit physical register def/use/kill information? That's up to the instruction set definitions to get right. SVE's FFR is an interesting beast because it's a cumulative fault register, so there isn't really a sensible notion of an output dependence, kind of like FPSR flags. Getting the order of writes to it wrong generally doesn't matter until it's read. However, RDFFR/RDFFRS/SETFFR/WRFFR probably should be marked as unpipelineable instructions so that we don't try to pipeline some odd code that actually does care about the order. I do think we've gotten a bit far afield here. We don't (yet) have an example of a failure case due to loop-carried physical registers or memory dependencies with the old code, and we're leaving in most of the bits of logic that purport to handle it. The only question is whether the removal of the reversal code around createAdjacencyStructure is going to create a new failure case in the presence of loop-carried physical register or memory dependencies. That removal results in fewer computed circuits. Circuits are used for MII calculation and for checkValidNodeOrder. MII calculation is not a correctness issue. As for validity, being in a circuit makes it legal to be after both successors and predecessors, so fewer circuits means fewer legal orders, so I don't think we can introduce new failure cases, just reduced opportunities for pipelining. So, I think we're good for now. I would like to see what others have to say. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, after all that discussion, I'm good with this patch but I'd like to see what other people have to say about the new abstraction itself.
Thanks for your detailed comments!
Unless we can determine that it's not going to be a problem, I think we should be conservative with dependency analysis. However, as you said, it's also true that I don't have an example that triggers invalid code generation. It also complicates things that missing dependencies do not necessarily cause invalid code to be generated. In any case, the FIXME comments I added in this patch need to be changed.
I believe that the circuits to be detected do not change with or without this patch, because
Also, |
// To preserve previous behavior and prevent regression | ||
// FIXME: Remove if this doesn't have significant impact on | ||
for (const auto &OE : DDG->getOutEdges(SU)) | ||
if (OE.getDistance() == 1) | ||
Worklist.push_back(OE.getDst()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpenry
Let me make sure. In terms of correctness, we can remove the for statement here, right? I added this for statement to preserve the behavior of the original implementation. This appends all predecessors of the SU, which may contain anti-dependencies from a PHI. There is a similar part in normalizeNonPipelinedInstructions
, which I believe we can remove as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the original code is adding the loop-carried antidependences for just the PHI nodes and I think those edges are actually in the new DDG, the dependent nodes should already be in the worklist. If this is true, then the second for statement should be removable.
As for the code in normalizeNonPipelinedInstructions, I think the FIXME section maybe wasn't needed to start with because the original SU.Preds didn't contain the loop-carried antidependences, so adding them in wasn't needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those edges are actually in the new DDG, the dependent nodes should already be in the worklist
This is true.
Then, it looks like that it's safe to remove both of them. I'd like to remove them in the future to simplify the code, if there is no performance degradation. Thank you for your review!
@madhur13490 @androm3da |
Gentle ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review!
Hi. Thank you to everyone who reviewed it. It seems to me that no one is currently in a position to be a code owner of I believe that this patch won't cause any performance degradation. Also, this is necessary to fix correctness issues related to dependency analysis. |
Thanks for adding this change. I'm in favor of adding the abstraction as it is a big improvement over the existing way of using the DAG. I'm sure there will be changes or other improvements, but it's a step in the right direction. I've only superficially gone through the changes, so I defer to @dpenry, who seems good with the patch. |
@bcahoon Thanks for your response. Yes, this is the first step of improvements. I would like to submit more patches after this is merged. @dpenry Please let me check. Your review status is now "Request changes". What kind of changes are you requesting? Or did you simply misstate the status? Your last comment looks to me like you approve of this patch. |
…e Graph In MachinePipeliner, a DAG class is used to represent the Data Dependence Graph. Data Dependence Graph generally contains cycles, so it's not appropriate to use DAG classes. In fact, some "hacks" are used to express back-edges in the current implementation. This patch adds a new class to provide a better interface for manipulating dependencies. Our approach is as follows: - To build the graph, we use the ScheduleDAGInstrs class as it is, because it has powerful functions and the current implementation depends heavily on it. - After the graph construction is finished (i.e., during scheduling), we use the new class DataDependenceGraph to manipulate the dependencies. Since we don't change the dependencies during scheduling, the new class only provides functions to read them. Also, this patch is only a refactoring, i.e., scheduling results should not change with or without this patch.
b71f265
to
f6f6c72
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/140/builds/13677 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/18383 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/10572 Here is the relevant piece of the build log for the reference
|
…e Graph
In MachinePipeliner, a DAG class is used to represent the Data Dependence Graph. Data Dependence Graph generally contains cycles, so it's not appropriate to use DAG classes. In fact, some "hacks" are used to express back-edges in the current implementation. This patch adds a new class to provide a better interface for manipulating dependencies. Our approach is as follows:
Since we don't change the dependencies during scheduling, the new class only provides functions to read them. Also, this patch is only a refactoring, i.e., scheduling results should not change with or without this patch.
This patch is a part of the improvement of MachinePipeliner discussed on https://discourse.llvm.org/t/machinepipeliner-replace-swingschedulerdag-with-directed-graph-that-allows-cycles/76465 . There are other patches that depend on this and will be submitted later. Specifically, I am planning to submit the following:
I've written several
FIXME
comments in this patch, and they are resolved by subsequent patches listed above.