[IR Container] Phase 2.6 Concurrency & Thread Safety#5971
[IR Container] Phase 2.6 Concurrency & Thread Safety#5971mdavis36 wants to merge 2 commits intomd/phase2-copy-movefrom
Conversation
|
!test |
Description
|
| Relevant files | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 No relevant tests |
| ⚡ Recommended focus areas for review |
Lock ordering consistency
|
Test failures
-
(Low, 1)
Tensor numerical mismatches in nvFuser HopperMatmulTest suiteTest Name H100 Source HopperMatmulTest.PingPongPersistent ❌ Link
2cac45c to
8fb976b
Compare
192fd55 to
35b7405
Compare
|
!test |
Greptile SummaryThis PR adds Key Changes:
Thread Safety:
Risk Assessment: Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Fusion::registerVal/Expr/removeVal/Expr] --> B[Acquire unique_lock on mutex_]
B --> C[Call ContainerMutator static method]
C --> D[Direct field access - lock-free]
D --> E[May call nested ContainerMutator methods]
E --> D
D --> F[Release lock when function returns]
G[IrContainer public accessors] --> H{Read or Write?}
H -->|Read| I[Acquire shared_lock]
H -->|Write| J[Acquire unique_lock]
I --> K[Access internal data]
J --> K
K --> L[Return/Release lock]
M[Nested Call Example:<br/>removeVal → removeExpr] --> N[Lock acquired once in removeVal]
N --> O[ContainerMutator::removeVal<br/>calls ContainerMutator::removeExpr]
O --> P[Both execute under same lock<br/>No deadlock!]
style B fill:#90EE90
style C fill:#87CEEB
style I fill:#FFD700
style J fill:#FF6B6B
style P fill:#90EE90
Last reviewed commit: 31bccb9 |
| const std::unordered_set<Expr*>& IrContainer::unordered_exprs() const noexcept { | ||
| std::shared_lock lock(mutex_); | ||
| return exprs_; | ||
| } | ||
|
|
||
| const std::unordered_set<Val*>& IrContainer::vals() const noexcept { | ||
| std::shared_lock lock(mutex_); | ||
| return vals_; | ||
| } |
There was a problem hiding this comment.
Reference outlives lock scope
unordered_exprs(), vals(), sharingFusions(), valsOwnedBy(), and exprsOwnedBy() all return const& to internal data, but the shared_lock is released when the function returns. This means callers hold an unprotected reference to the data structure, and any concurrent writer (under unique_lock) could mutate or invalidate it.
This is safe in Phase 2 (containers are not shared across threads), but in Phase 3 where multiple threads write to the same IrContainer, this pattern will produce data races. Consider documenting this as a known limitation to address in Phase 3, or returning by value for these accessors (at the cost of a copy).
Additional Comments (1)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
|
!test |
Add std::shared_mutex to IrContainer to protect shared mutable state during concurrent access from parallel compilation threads. - IrContainer public methods self-lock (shared_lock for reads, unique_lock for writes) - Fusion mutation methods (registerVal/Expr, removeVal/Expr, removeStatementsCreatedAfter) acquire unique_lock then delegate to lock-free ContainerMutator static methods, avoiding self-deadlock on nested calls (e.g., removeVal → removeExpr) - ContainerMutator is a PIMPL struct defined only in fusion.cpp, keeping lock-free impl methods out of the header - Remove kPhase2DisableParallelCompile guard, re-enabling parallel compilation now that the mutex is in place - Delete dead IrContainer::copy() and IrContainer::swap() methods
8fb976b to
31bccb9
Compare
35b7405 to
88b2e60
Compare
…tainers Update Fusion::removeStatementsCreatedAfter to compare per-Fusion counts (from exprsOwnedBy(this) and numValsExcludingShortcuts()) instead of global deque sizes. This correctly handles shared containers where other Fusions' statements would inflate the global counts. Add NVF_ERROR assertions to verify the LIFO invariant: the tail element of the global deque must belong to this Fusion. If violated, another Fusion appended concurrently (should be prevented by PR #5971 locking). Remove now-unnecessary deque size validation checks.
Summary
Add
std::shared_mutexto IrContainer for concurrent read access during parallel compilation, remove thekPhase2DisableParallelCompileserialization guard introduced in PR 1, and validate that the full test suite passes with parallel compilation re-enabled.This is a future-proofing and defensive correctness change. Phase 2's
makeFusionpath does NOT share containers (each segment gets its own container via the default constructor), so parallel compilation is technically safe without the mutex. However, Phase 3 will changemakeFusionto use the copy constructor (shared container), at which point multiple threads will write to the sameIrContainerconcurrently. The mutex must be in place before Phase 3 can enable that.The Nested Call Problem and ContainerMutator
Five Fusion methods directly access IrContainer's internal fields because statement registration was moved from IrContainer to Fusion previously:
removeVal()callsremoveExpr(), andregisterExpr()also callsremoveExpr(). Sincestd::shared_mutexis not recursive, acquiringunique_lockin both the outer and inner methods would deadlock.The solution is a two-layer locking architecture:
ContainerMutatoris forward-declared infusion.h(2 lines) and fully defined infusion.cpp. This keeps the header clean and makes the locking architecture self-documenting: everything insideContainerMutatorassumes the lock is already held.Thread Safety Analysis
Dead Code Removal
Investigation revealed that
IrContainer::copy()andIrContainer::swap()have zero call sites — all copy/move/swap semantics are handled at the Fusion level after previous work. Removing them eliminates ~45 lines of dead code and avoids complex dual-locking patterns.Relationship to Phase 2
This PR completes the Phase 2 architectural work. With thread safety in place, the full shared scalar infrastructure is ready for Phase 3:
CI Risk
Low-medium. This is the first CI run with parallel compilation re-enabled since PR #5961 serialized it. Any latent concurrency issues would surface here. The parallel compilation path doesn't share containers in Phase 2, so the mutex is defensive — but re-enabling parallelism exercises the full concurrent codegen pipeline.