Conversation
|
Review updated until commit 80a701c Description
|
| Relevant files | |||
|---|---|---|---|
| Enhancement |
| ||
| Tests |
| ||
| Configuration changes |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
C++20 Features Usage
<ranges> and std::views::reverse. This may cause compilation issues on systems with older C++ standard support. Consider adding appropriate compile-time checks or providing fallback implementations. |
…located when not inside for loop
|
!test |
| /*pre_fn=*/ | ||
| [&](const DominatorTree::Node* node) { | ||
| Expr* e = node->getExpr(); | ||
| if (auto* alloc = dynamic_cast<kir::Allocate*>(e)) { |
There was a problem hiding this comment.
Temporary if. With hir::allocate, this can be captured with inputs().
| } | ||
| return true; | ||
| } | ||
|
|
There was a problem hiding this comment.
This is not complete. Ops like view do not always allocate new tensors. Few things make that analysis tricky:
- Aliasing information is not available in
hicas it copies over fromcompleteFusion. Question: Should we run an aliasing pass in host IR as well for expr-evaluated segments? - HostIrEvaluator for
LoadStoreOpchecks if theout_tvis known and either copies over the data or binds it to a view of the input. HostIrJit always creates a new tensor for ops like permute:. Question: Is this just for simplicity for the first integration?Fuser/csrc/host_ir/jit_external.cpp
Line 175 in bf8e1f6
One solution can be to explicitly allocate expr eval outputs where needed like we do for matmul/linear. Then, we only deallocate tvs that are allocated.
The previous version did not make any distinction for view-like ops, so the functionality does not regress.
What do you think @wujingyue
There was a problem hiding this comment.
Sorry, I'm missing some context. Can you remind me why this PR needs to change how we decide what needs deallocation? I understood the motivation of looking into loops but I'm missing some connections otherwise.
There was a problem hiding this comment.
Can you remind me why this PR needs to change how we decide what needs deallocation?
This PR does not need to necessarily change this. But we do need to decide what needs deallocation since not all ops allocate new tensors.
I initially started with deallocating only explicitly "allocated" tensorviews. However, that breaks the HostIrJit tests where outputs of view/permute are also new allocated tensors.
If I deallocate everything, that includes outputs of ShardByStream which are not new tensors, rather slices (we could amend aliasing such that we mark these as aliases of their inputs). Hence, I am placing some minimum conditions on what needs deallocation for current use cases.
There was a problem hiding this comment.
If I deallocate everything, that includes outputs of ShardByStream which are not new tensors
Got it. This is actually the old behavior. It didn't trigger this problem because ShardByStream is never top-level.
There was a problem hiding this comment.
HostIrEvaluator handles deallocation by removing the tensor from the underlying hash table. It doesn't always free the memory. What problems did you run into with ShardByStream exactly?
I can try it myself tomorrow. Not on a computer right now
There was a problem hiding this comment.
HostIrEvaluator handles deallocation by removing the tensor from the underlying hash table. It doesn't always free the memory.
Correct. I did not run into any errors with existing tests since handle(Deallocate*) only invalidates. But looking at the HostIrJit behavior, it actually deletes the tensor, hence, I avoided adding deallocation statements for those.
For simplicity, if you prefer, I can remove the additional conditions from this PR, and we can discuss that in a separate PR.
|
!test |
|
!test |
Greptile SummaryRefactored Key Changes:
Issues Found:
Confidence Score: 3/5
Important Files Changed
Last reviewed commit: 80a701c |
| if (!needsDeallocation(allocated_tv)) { | ||
| continue; | ||
| } | ||
| const DominatorTree::Node* last_use_node = last_use.at(allocated_tv); |
There was a problem hiding this comment.
.at() will throw if allocated_tv is not in last_use map. This can happen if a tensor is allocated (via kir::Allocate) but never used as input to any expression. In post_fn (line 240-248), last_use is only populated for allocated tensors and tensors used as inputs - but if an allocated tensor is never used as input, it won't be in last_use.
| const DominatorTree::Node* last_use_node = last_use.at(allocated_tv); | |
| const DominatorTree::Node* last_use_node = last_use.at(allocated_tv); |
| if (tv->isFusionOutput()) { | ||
| return false; | ||
| } | ||
| if (tv->definition()->isA<ShardByStream>()) { |
There was a problem hiding this comment.
Add null check before dereferencing tv->definition(). While fusion inputs are filtered at line 198, there may be other cases where definition() is nullptr (e.g., intermediate values without definitions).
| if (tv->definition()->isA<ShardByStream>()) { | |
| if (tv->definition() != nullptr && tv->definition()->isA<ShardByStream>()) { |
| } | ||
| return true; | ||
| } | ||
|
|
There was a problem hiding this comment.
Sorry, I'm missing some context. Can you remind me why this PR needs to change how we decide what needs deallocation? I understood the motivation of looking into loops but I'm missing some connections otherwise.
| if (in->isFusionOutput()) { | ||
| continue; | ||
| } | ||
| DominatorTree dom_tree(hic); |
There was a problem hiding this comment.
(I haven't yet read the code carefully so I could well be missing some important details.)
In compiler, there's a thing called post dominator tree, which could lead to a more general solution. Post dominator tree is similar to dominator tree but with the control flow reversed. IIUC, the position to deallocate a tv (if it needs to be deallocated) is the nearest post dominator of tv and all its uses. This can be computed as their lowest common ancestor (LCA) in the post dominator tree.
I WFH today but I'm happy to explain this in person tomorrow or Friday.
There was a problem hiding this comment.
Can you take a look at the latest commit for the post-dominator tree version?
| namespace { | ||
|
|
||
| class DominatorTree { | ||
| class Node { |
There was a problem hiding this comment.
TODO: create different nodes in each tree to avoid overloading too much.
Additional Comments (1)
|
No description provided.