Conversation
|
Review updated until commit 2078dbc Description
|
| Relevant files | |||
|---|---|---|---|
| Enhancement |
| ||
| Tests |
| ||
| Configuration changes |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Post-dominator tree build iteration issue
PostDominatorTree::build (lines 154-176), the reverse iteration uses auto it = exprs.end(); it != exprs.begin(); --it. This pattern is problematic because exprs.end() is a valid iterator to one-past-the-end, and dereferencing it would be undefined behavior. While the loop body doesn't dereference it directly (it first decrements), this pattern is fragile and could cause issues if the container is empty. Consider adding an explicit empty check or using std::ranges::reverse_view for safer reverse iteration. |
…located when not inside for loop
|
!test |
|
!test |
|
!test |
Greptile SummaryRefactors
The refactoring addresses previous concerns about Confidence Score: 4/5
Important Files Changed
Last reviewed commit: 2078dbc |
Additional Comments (1)
|
Additional Comments (3)
|
Additional Comments (2)
The The diff shows this was introduced by this PR, changing
Node(Scope* scope, Scope::Iterator iterator, const Node* parent)where Beyond the type error, even if the iterator were stored, lca_node->scope()->insert(std::next(lca_node->iterator()), deallocate);
The fix is to convert to a forward iterator before storing. The element pointed to by reverse iterator for (auto it = exprs.rbegin(); it != exprs.rend(); ++it) {
Expr* e = *it;
Scope::Iterator fwd_it = std::prev(it.base());
auto [node_it, inserted] = nodes_.try_emplace(e, &scope, fwd_it, parent); |
|
!test |
|
!test |
Additional Comments (1)
|
wujingyue
left a comment
There was a problem hiding this comment.
Almost there. I'm reviewing the LCA part...
| // For each TensorView that is allocated or used as an input, find its | ||
| // least common ancestor in the Post-dominator Tree — the latest point at which | ||
| // it can be deallocated. | ||
| std::unordered_map<TensorView*, const Node*> computeLeastCommonAncestor( |
There was a problem hiding this comment.
| std::unordered_map<TensorView*, const Node*> computeLeastCommonAncestor( | |
| std::unordered_map<TensorView*, const Node*> computeLowestCommonAncestor( |
| } | ||
| PostDominatorTree post_dominator_tree(hic); | ||
| const std::unordered_map<TensorView*, const Node*>& lca_map = | ||
| computeLeastCommonAncestor(post_dominator_tree); |
There was a problem hiding this comment.
Consider wrapping this in a class so we don't have to expose std::unordered_map to the user.
lcas = LowestCommonAncestors(post_dominator_tree);
lcas.getLca(tv);
There was a problem hiding this comment.
What is the downside of exposing the lca_map?
| std::unordered_map<const Node*, int64_t> depth; | ||
|
|
||
| auto findLCA = [&](const Node* a, const Node* b) -> const Node* { | ||
| if (a == nullptr) { | ||
| return b; | ||
| } | ||
| if (b == nullptr) { | ||
| return a; | ||
| } | ||
| int64_t depth_a = depth.at(a); | ||
| int64_t depth_b = depth.at(b); | ||
| while (depth_a > depth_b) { | ||
| a = a->parent(); | ||
| depth_a--; | ||
| } | ||
| while (depth_b > depth_a) { | ||
| b = b->parent(); | ||
| depth_b--; | ||
| } | ||
| while (a != b) { | ||
| a = a->parent(); | ||
| b = b->parent(); | ||
| } | ||
| return a; | ||
| }; |
There was a problem hiding this comment.
Consider making it a private method of class LowestCommonAncestors, and making depth LowestCommonAncestors::depth_.
| // Traverse the IR and collect all allocated Tensorviews and remove them when | ||
| // a Deallocate is encountered. | ||
| void collectPersistentTensorViews( | ||
| const Scope& scope, | ||
| std::unordered_set<TensorView*>& allocated) { | ||
| for (Expr* e : scope.exprs()) { | ||
| if (auto* dealloc = dynamic_cast<hir::Deallocate*>(e)) { | ||
| allocated.erase(dealloc->buffer()); | ||
| continue; | ||
| } | ||
| if (auto* alloc = dynamic_cast<kir::Allocate*>(e)) { | ||
| allocated.insert(alloc->buffer()->as<TensorView>()); | ||
| continue; | ||
| } | ||
| for (auto* tv : ir_utils::filterByType<TensorView>(e->inputs())) { | ||
| allocated.insert(tv); | ||
| } | ||
| for (auto* tv : ir_utils::filterByType<TensorView>(e->outputs())) { | ||
| allocated.insert(tv); | ||
| } | ||
| if (auto* loop = dynamic_cast<hir::ForLoop*>(e)) { | ||
| collectPersistentTensorViews(loop->body(), allocated); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| void checkMemoryLeak(const hir::HostIrContainer& hic) { | ||
| std::unordered_set<TensorView*> allocated; | ||
| collectPersistentTensorViews(hic.topLevel(), allocated); | ||
| EXPECT_TRUE(std::all_of( | ||
| allocated.begin(), | ||
| allocated.end(), | ||
| [](TensorView* tv) { | ||
| return tv->isFusionInput() || tv->isFusionOutput(); | ||
| })) | ||
| << "Some TensorViews allocated in IR are not deallocated and not fusion " | ||
| "inputs/outputs."; | ||
| } |
There was a problem hiding this comment.
I understood your intention for better coverage but these helpers make the test less DAMP: https://testing.googleblog.com/2019/12/testing-on-toilet-tests-too-dry-make.html. Someone debugging a test has to also understand the logic of these two functions.
It may work better if we simply test the number of deallocations in the host IR container.
There was a problem hiding this comment.
I see your point.
What do you think about testing the deallocation for the intermediate output -- that is, the number of deallocations in top_level?
When testing directly, I don't find it very obvious/upfront how many intermediates we will have that get a deallocate -- hence, the traversal based approach so I don't have to figure that out when reading the test.
Made-with: Cursor
Additional Comments (1)
|
No description provided.