-
Notifications
You must be signed in to change notification settings - Fork 112
Explicitely use cudaStream_t for cub::DeviceTransform and use non blocking stream #771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughReplaced many uses of Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cpp/src/linear_programming/termination_strategy/infeasibility_information.cu (1)
388-393: Inconsistent stream parameter: This Transform call should also usestream_view_.value().This
cub::DeviceTransform::Transformcall still passesstream_view_directly (line 393), while the earlier Transform call at line 361 was updated to usestream_view_.value(). For consistency and to fully address the CCCL 3.2 API change, this call should be updated as well.🔧 Proposed fix
cub::DeviceTransform::Transform( cuda::std::make_tuple(reduced_cost_.data(), problem_ptr->variable_bounds.data()), bound_value_.data(), primal_size_h_, bound_value_reduced_cost_product<f_t, f_t2>(), - stream_view_); + stream_view_.value());cpp/src/dual_simplex/barrier.cu (1)
3138-3150: Updatecub::DeviceTransform::Transformcalls for consistency with other stream parameter usage in the file.Lines 3143 and 3150 use
stream_view_directly, while most othercub::DeviceTransform::Transformcalls in the file usestream_view_.value(). Additionally,compute_final_direction(lines 3213, 3223, 3230) andcompute_next_iterate(lines 3292, 3302, 3309) have the same inconsistency. For consistency with the predominant pattern across the file, either allcub::DeviceTransform::Transformcalls should use.value()or all should use the barestream_view_with implicit conversion. Choose one approach and apply it uniformly.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
cpp/src/dual_simplex/barrier.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/pdlp.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/linear_programming/utilities/cython_solve.cu
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cu,cuh}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cu,cuh}: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/dual_simplex/barrier.cucpp/src/linear_programming/pdlp.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
**/*.cu
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.cu: Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths
Files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/dual_simplex/barrier.cucpp/src/linear_programming/pdlp.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
**/*.{cu,cuh,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...
Files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/dual_simplex/barrier.cucpp/src/linear_programming/pdlp.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
**/*.{cu,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code
Files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/dual_simplex/barrier.cucpp/src/linear_programming/pdlp.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
🧠 Learnings (18)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
Applied to files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/dual_simplex/barrier.cucpp/src/linear_programming/pdlp.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Applied to files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/dual_simplex/barrier.cucpp/src/linear_programming/pdlp.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Applied to files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/dual_simplex/barrier.cucpp/src/linear_programming/pdlp.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Applied to files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/dual_simplex/barrier.cucpp/src/linear_programming/pdlp.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Applied to files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/dual_simplex/barrier.cucpp/src/linear_programming/pdlp.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-12-04T20:09:09.264Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.
Applied to files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Applied to files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Applied to files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/dual_simplex/barrier.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Applied to files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify error propagation from CUDA to user-facing APIs is complete; ensure CUDA errors are caught and mapped to meaningful user error codes
Applied to files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/dual_simplex/barrier.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cpp,hpp,h} : Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code
Applied to files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment
Applied to files:
cpp/src/linear_programming/termination_strategy/infeasibility_information.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Applied to files:
cpp/src/linear_programming/utilities/cython_solve.cucpp/src/dual_simplex/barrier.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
Applied to files:
cpp/src/linear_programming/utilities/cython_solve.cucpp/src/linear_programming/pdlp.cucpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cucpp/src/linear_programming/pdhg.cucpp/src/linear_programming/termination_strategy/convergence_information.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Applied to files:
cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cucpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Applied to files:
cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
🧬 Code graph analysis (1)
cpp/src/dual_simplex/barrier.cu (1)
cpp/src/linear_programming/pdlp.cu (1)
d_z(1486-1486)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: wheel-build-cuopt-sh-client / 13.1.0, 3.10, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.13, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.11, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.12, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.11, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.10, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.12, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.10, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.13, amd64, rockylinux8
- GitHub Check: checks / check-style
🔇 Additional comments (31)
cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu (1)
82-89: LGTM: Stream parameter fix for CCCL 3.2 compatibility.The change from
stream_view_tostream_view_.value()correctly extracts the rawcudaStream_tforcub::DeviceTransform::Transform, addressing the API change in CCCL 3.2. Both Transform calls are updated consistently.cpp/src/linear_programming/pdhg.cu (4)
129-138: LGTM: Consistent stream parameter update for dual projection.The
stream_view_.value()change correctly provides the rawcudaStream_ttocub::DeviceTransform::Transformfor CCCL 3.2 compatibility.
187-197: LGTM: Stream parameter correctly updated for primal projection.
337-364: LGTM: Both reflected projection Transform calls updated consistently.The
should_majorbranch correctly usesstream_view_.value()for both primal and dual reflected projections incompute_next_primal_dual_solution_reflected.
381-405: LGTM: Non-major iteration branch also updated consistently.Both Transform calls in the else branch are correctly updated to use
stream_view_.value().cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu (2)
1680-1685: LGTM: Stream parameter updated for bounds extraction transform.The change correctly uses
stream_view_.value()for thecub::DeviceTransform::Transformcall that extracts variable bounds.
1863-1869: LGTM: Stream parameter updated for clamping transform.The
stream_view_.value()change for the clamp transform is consistent with the fix pattern.cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu (1)
301-307: LGTM: Stream parameter fix for interaction computation.The
stream_view_.value()change for the subtraction transform incompute_interaction_and_movementcorrectly addresses CCCL 3.2 compatibility.cpp/src/linear_programming/termination_strategy/infeasibility_information.cu (1)
356-361: LGTM: Stream parameter updated for bound value gradient transform.The change to
stream_view_.value()correctly addresses CCCL 3.2 compatibility.cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu (1)
433-489: LGTM! Correct fix for CCCL 3.2 API compatibility.The changes correctly extract the underlying
cudaStream_tvia.value()for thecub::DeviceTransform::Transformcalls. This aligns with the CCCL 3.2 API change that now explicitly requirescudaStream_trather than accepting stream wrapper objects. The other CUDA operations (kernel launches,cudaMemsetAsync, thrust, raft operations) appropriately continue usingstream_view_directly since they acceptrmm::cuda_stream_view.cpp/src/linear_programming/utilities/cython_solve.cu (1)
232-233: LGTM! Proper non-blocking stream initialization.The change correctly creates a non-blocking CUDA stream and passes it to the raft handle. This is important for:
- Avoiding implicit synchronization with the default stream in concurrent scenarios (batch solve at line 307)
- Proper stream lifecycle management - the
rmm::cuda_streamowns the underlyingcudaStream_tand will destroy it when going out of scope, after the handleAs per coding guidelines, explicitly creating and managing dedicated streams for concurrent operations is the preferred approach.
cpp/src/linear_programming/pdlp.cu (4)
1032-1043: LGTM! Delta computation stream parameters fixed.The
cub::DeviceTransform::Transformcalls for computing delta primal and delta dual correctly usestream_view_.value()to extract the underlyingcudaStream_t, addressing the CCCL 3.2 API change.
1137-1150: LGTM! Initial primal projection stream parameters fixed.The clamp operations for projecting initial primal solutions correctly pass the underlying stream handle.
1393-1420: LGTM! Halpern update stream parameters fixed.Both primal and dual update transforms in the Halpern iteration correctly use
stream_view_.value().
1525-1574: LGTM! Initial step size computation stream parameters fixed.The normalization and residual computation transforms correctly extract the underlying stream handle.
cpp/src/linear_programming/termination_strategy/convergence_information.cu (4)
268-281: LGTM! Primal residual computation stream parameter fixed.The
cub::DeviceTransform::Transformcall for computing primal residual and primal slack correctly usesstream_view_.value().
359-364: LGTM! Dual residual computation stream parameter fixed.The transform for computing dual residual correctly passes the underlying stream handle.
457-462: LGTM! Reduced cost computation stream parameter fixed.The bound value gradient transform correctly uses
stream_view_.value().
490-503: BothDeviceTransform::TransformandDeviceReduce::Sumcorrectly acceptstream_view_due to implicit conversion fromrmm::cuda_stream_viewtocudaStream_t. The inconsistency in style (one using.value()explicitly, the other passing the wrapper directly) is not a functional issue. Both patterns are correct and work interchangeably.cpp/src/dual_simplex/barrier.cu (12)
48-60: LGTM!The
pairwise_multiplyhelper functions correctly usestream.value()to extract the rawcudaStream_tforcub::DeviceTransform::Transform, addressing the CCCL 3.2 API change.
62-87: LGTM!The
axpyhelper functions correctly usestream.value()for thecub::DeviceTransform::Transformcalls.
1369-1377: LGTM!The
gpu_adat_multiplymethod correctly usesstream_view_.value()for the element-wise multiply transform.
2050-2103: LGTM!All
cub::DeviceTransform::Transformcalls ingpu_compute_residualsare consistently updated to usestream_view_.value().
2343-2379: LGTM!The diagonal, inverse diagonal, and sqrt inverse diagonal formation correctly uses
stream_view_.value()for allcub::DeviceTransform::Transformcalls.
2430-2457: LGTM!The GPU compute H block correctly uses
stream_view_.value()for the transform operations.
2786-2804: LGTM!The dz formation correctly uses
stream_view_.value().
2828-2847: LGTM!The dv formation correctly uses
stream_view_.value().
2914-2950: LGTM!The dw formation and residual computations correctly use
stream_view_.value().
2998-3021: LGTM!The
compute_affine_rhsmethod correctly usesstream_view_.value()for negating the complementarity residuals.
3205-3230: Inconsistent stream parameter usage forcub::DeviceTransform::Transform.These three
cub::DeviceTransform::Transformcalls at lines 3213, 3223, and 3230 usestream_view_without.value(), which is inconsistent with the other updated calls in this PR.🔧 Suggested fix for consistency
cub::DeviceTransform::Transform( cuda::std::make_tuple( data.d_dw_aff_.data(), data.d_dv_aff_.data(), data.d_dw_.data(), data.d_dv_.data()), thrust::make_zip_iterator(data.d_dw_.data(), data.d_dv_.data()), data.d_dw_.size(), [] HD(f_t dw_aff, f_t dv_aff, f_t dw, f_t dv) -> thrust::tuple<f_t, f_t> { return {dw + dw_aff, dv + dv_aff}; }, - stream_view_); + stream_view_.value()); cub::DeviceTransform::Transform( cuda::std::make_tuple( data.d_dx_aff_.data(), data.d_dz_aff_.data(), data.d_dx_.data(), data.d_dz_.data()), thrust::make_zip_iterator(data.d_dx_.data(), data.d_dz_.data()), data.d_dx_.size(), [] HD(f_t dx_aff, f_t dz_aff, f_t dx, f_t dz) -> thrust::tuple<f_t, f_t> { return {dx + dx_aff, dz + dz_aff}; }, - stream_view_); + stream_view_.value()); cub::DeviceTransform::Transform( cuda::std::make_tuple(data.d_dy_aff_.data(), data.d_dy_.data()), data.d_dy_.data(), data.d_dy_.size(), [] HD(f_t dy_aff, f_t dy) { return dy + dy_aff; }, - stream_view_); + stream_view_.value());
3284-3309: Inconsistent stream parameter usage forcub::DeviceTransform::Transform.These three
cub::DeviceTransform::Transformcalls at lines 3292, 3302, and 3309 usestream_view_without.value(), which is inconsistent with the other updated calls in this PR.🔧 Suggested fix for consistency
cub::DeviceTransform::Transform( cuda::std::make_tuple( data.d_w_.data(), data.d_v_.data(), data.d_dw_.data(), data.d_dv_.data()), thrust::make_zip_iterator(data.d_w_.data(), data.d_v_.data()), data.d_dw_.size(), [step_primal, step_dual] HD(f_t w, f_t v, f_t dw, f_t dv) -> thrust::tuple<f_t, f_t> { return {w + step_primal * dw, v + step_dual * dv}; }, - stream_view_); + stream_view_.value()); cub::DeviceTransform::Transform( cuda::std::make_tuple( data.d_x_.data(), data.d_z_.data(), data.d_dx_.data(), data.d_dz_.data()), thrust::make_zip_iterator(data.d_x_.data(), data.d_z_.data()), data.d_dx_.size(), [step_primal, step_dual] HD(f_t x, f_t z, f_t dx, f_t dz) -> thrust::tuple<f_t, f_t> { return {x + step_primal * dx, z + step_dual * dz}; }, - stream_view_); + stream_view_.value()); cub::DeviceTransform::Transform( cuda::std::make_tuple(data.d_y_.data(), data.d_dy_.data()), data.d_y_.data(), data.d_y_.size(), [step_dual] HD(f_t y, f_t dy) { return y + step_dual * dy; }, - stream_view_); + stream_view_.value());
bff475c to
2f88100
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/linear_programming/utilities/cython_solve.cu (1)
232-247: CUDA stream resource leak: stream created but never destroyed.The
cudaStream_tcreated at line 233 is never destroyed withcudaStreamDestroy(). Whenraft::handle_tis constructed with an external stream, it does not take ownership—the caller remains responsible for cleanup. This leaks a stream on every call tocall_solve, which is especially problematic in batch mode where multiple parallel calls accumulate leaked streams.Based on learnings, ensure cleanup of streams/events.
🔧 Proposed fix: Add stream synchronization and cleanup
std::unique_ptr<solver_ret_t> call_solve( cuopt::mps_parser::data_model_view_t<int, double>* data_model, cuopt::linear_programming::solver_settings_t<int, double>* solver_settings, unsigned int flags, bool is_batch_mode) { raft::common::nvtx::range fun_scope("Call Solve"); cudaStream_t stream; RAFT_CUDA_TRY(cudaStreamCreateWithFlags(&stream, flags)); const raft::handle_t handle_{stream}; auto op_problem = data_model_to_optimization_problem(data_model, solver_settings, &handle_); solver_ret_t response; if (op_problem.get_problem_category() == linear_programming::problem_category_t::LP) { response.lp_ret = call_solve_lp(op_problem, solver_settings->get_pdlp_settings(), is_batch_mode); response.problem_type = linear_programming::problem_category_t::LP; } else { response.mip_ret = call_solve_mip(op_problem, solver_settings->get_mip_settings()); response.problem_type = linear_programming::problem_category_t::MIP; } + RAFT_CUDA_TRY(cudaStreamSynchronize(stream)); + RAFT_CUDA_TRY(cudaStreamDestroy(stream)); + return std::make_unique<solver_ret_t>(std::move(response)); }Alternatively, consider using a RAII wrapper for automatic cleanup to ensure exception safety.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cpp/src/linear_programming/utilities/cython_solve.cu
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cu,cuh}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cu,cuh}: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Files:
cpp/src/linear_programming/utilities/cython_solve.cu
**/*.cu
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.cu: Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths
Files:
cpp/src/linear_programming/utilities/cython_solve.cu
**/*.{cu,cuh,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...
Files:
cpp/src/linear_programming/utilities/cython_solve.cu
**/*.{cu,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code
Files:
cpp/src/linear_programming/utilities/cython_solve.cu
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Applied to files:
cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Applied to files:
cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Applied to files:
cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2026-01-14T00:38:33.700Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 746
File: cpp/src/dual_simplex/barrier.cu:549-552
Timestamp: 2026-01-14T00:38:33.700Z
Learning: In cpp/src/dual_simplex/barrier.cu's form_adat() method within the barrier solver, the raft::copy from d_original_A_values to device_AD.x is necessary and must not be removed. This copy restores the original matrix A values before they are scaled by the diagonal matrix D (via d_inv_diag_prime). Since D changes between iterations and the scaling mutates device_AD.x, copying the original values each iteration is required for correctness.
Applied to files:
cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
Applied to files:
cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Applied to files:
cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-12-04T20:09:09.264Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.
Applied to files:
cpp/src/linear_programming/utilities/cython_solve.cu
🧬 Code graph analysis (1)
cpp/src/linear_programming/utilities/cython_solve.cu (1)
cpp/tests/distance_engine/waypoint_matrix_test.cpp (4)
stream(49-87)stream(89-97)stream(136-161)stream(197-214)
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
efa2ac0 to
9293aed
Compare
| response.lp_ret.primal_solution_->set_stream(rmm::cuda_stream_per_thread); | ||
| response.lp_ret.dual_solution_->set_stream(rmm::cuda_stream_per_thread); | ||
| response.lp_ret.reduced_cost_->set_stream(rmm::cuda_stream_per_thread); | ||
| response.lp_ret.current_primal_solution_->set_stream(rmm::cuda_stream_per_thread); | ||
| response.lp_ret.current_dual_solution_->set_stream(rmm::cuda_stream_per_thread); | ||
| response.lp_ret.initial_primal_average_->set_stream(rmm::cuda_stream_per_thread); | ||
| response.lp_ret.initial_dual_average_->set_stream(rmm::cuda_stream_per_thread); | ||
| response.lp_ret.current_ATY_->set_stream(rmm::cuda_stream_per_thread); | ||
| response.lp_ret.sum_primal_solutions_->set_stream(rmm::cuda_stream_per_thread); | ||
| response.lp_ret.sum_dual_solutions_->set_stream(rmm::cuda_stream_per_thread); | ||
| response.lp_ret.last_restart_duality_gap_primal_solution_->set_stream( | ||
| rmm::cuda_stream_per_thread); | ||
| response.lp_ret.last_restart_duality_gap_dual_solution_->set_stream( | ||
| rmm::cuda_stream_per_thread); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll have to be careful not to forget to set the streams here in cython_solve, if we made any additions to the PDLP structures. However I cannot think of a way to do this programmatically and cleanly.... Maybe coderabbit can remember this as a rule somehow and help us there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
cpp/src/dual_simplex/barrier.cu (3)
3140-3152: Missing.value()calls incompute_cc_rhs- inconsistent with PR objective.These
cub::DeviceTransform::Transformcalls still usestream_view_directly instead ofstream_view_.value(), which is inconsistent with all other changes in this PR. If CCCL 3.2 requires the rawcudaStream_t, these calls will exhibit the same regression.🐛 Proposed fix
cub::DeviceTransform::Transform( cuda::std::make_tuple(data.d_dx_aff_.data(), data.d_dz_aff_.data()), data.d_complementarity_xz_rhs_.data(), data.d_complementarity_xz_rhs_.size(), [new_mu] HD(f_t dx_aff, f_t dz_aff) { return -(dx_aff * dz_aff) + new_mu; }, - stream_view_); + stream_view_.value()); cub::DeviceTransform::Transform( cuda::std::make_tuple(data.d_dw_aff_.data(), data.d_dv_aff_.data()), data.d_complementarity_wv_rhs_.data(), data.d_complementarity_wv_rhs_.size(), [new_mu] HD(f_t dw_aff, f_t dv_aff) { return -(dw_aff * dv_aff) + new_mu; }, - stream_view_); + stream_view_.value());
3207-3232: Missing.value()calls incompute_final_direction- inconsistent with PR objective.These three
cub::DeviceTransform::Transformcalls still passstream_view_directly instead ofstream_view_.value().🐛 Proposed fix
cub::DeviceTransform::Transform( cuda::std::make_tuple( data.d_dw_aff_.data(), data.d_dv_aff_.data(), data.d_dw_.data(), data.d_dv_.data()), thrust::make_zip_iterator(data.d_dw_.data(), data.d_dv_.data()), data.d_dw_.size(), [] HD(f_t dw_aff, f_t dv_aff, f_t dw, f_t dv) -> thrust::tuple<f_t, f_t> { return {dw + dw_aff, dv + dv_aff}; }, - stream_view_); + stream_view_.value()); cub::DeviceTransform::Transform( cuda::std::make_tuple( data.d_dx_aff_.data(), data.d_dz_aff_.data(), data.d_dx_.data(), data.d_dz_.data()), thrust::make_zip_iterator(data.d_dx_.data(), data.d_dz_.data()), data.d_dx_.size(), [] HD(f_t dx_aff, f_t dz_aff, f_t dx, f_t dz) -> thrust::tuple<f_t, f_t> { return {dx + dx_aff, dz + dz_aff}; }, - stream_view_); + stream_view_.value()); cub::DeviceTransform::Transform( cuda::std::make_tuple(data.d_dy_aff_.data(), data.d_dy_.data()), data.d_dy_.data(), data.d_dy_.size(), [] HD(f_t dy_aff, f_t dy) { return dy + dy_aff; }, - stream_view_); + stream_view_.value());
3286-3311: Missing.value()calls incompute_next_iterate- inconsistent with PR objective.These three
cub::DeviceTransform::Transformcalls still passstream_view_directly instead ofstream_view_.value().🐛 Proposed fix
cub::DeviceTransform::Transform( cuda::std::make_tuple( data.d_w_.data(), data.d_v_.data(), data.d_dw_.data(), data.d_dv_.data()), thrust::make_zip_iterator(data.d_w_.data(), data.d_v_.data()), data.d_dw_.size(), [step_primal, step_dual] HD(f_t w, f_t v, f_t dw, f_t dv) -> thrust::tuple<f_t, f_t> { return {w + step_primal * dw, v + step_dual * dv}; }, - stream_view_); + stream_view_.value()); cub::DeviceTransform::Transform( cuda::std::make_tuple( data.d_x_.data(), data.d_z_.data(), data.d_dx_.data(), data.d_dz_.data()), thrust::make_zip_iterator(data.d_x_.data(), data.d_z_.data()), data.d_dx_.size(), [step_primal, step_dual] HD(f_t x, f_t z, f_t dx, f_t dz) -> thrust::tuple<f_t, f_t> { return {x + step_primal * dx, z + step_dual * dz}; }, - stream_view_); + stream_view_.value()); cub::DeviceTransform::Transform( cuda::std::make_tuple(data.d_y_.data(), data.d_dy_.data()), data.d_y_.data(), data.d_y_.size(), [step_dual] HD(f_t y, f_t dy) { return y + step_dual * dy; }, - stream_view_); + stream_view_.value());
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cpp/src/dual_simplex/barrier.cu
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cu,cuh}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cu,cuh}: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Files:
cpp/src/dual_simplex/barrier.cu
**/*.cu
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.cu: Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths
Files:
cpp/src/dual_simplex/barrier.cu
**/*.{cu,cuh,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...
Files:
cpp/src/dual_simplex/barrier.cu
**/*.{cu,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code
Files:
cpp/src/dual_simplex/barrier.cu
🧠 Learnings (11)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 746
File: cpp/src/dual_simplex/barrier.cu:549-552
Timestamp: 2026-01-14T00:38:38.038Z
Learning: In cpp/src/dual_simplex/barrier.cu's form_adat() method within the barrier solver, the raft::copy from d_original_A_values to device_AD.x is necessary and must not be removed. This copy restores the original matrix A values before they are scaled by the diagonal matrix D (via d_inv_diag_prime). Since D changes between iterations and the scaling mutates device_AD.x, copying the original values each iteration is required for correctness.
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2026-01-14T00:38:38.038Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 746
File: cpp/src/dual_simplex/barrier.cu:549-552
Timestamp: 2026-01-14T00:38:38.038Z
Learning: In cpp/src/dual_simplex/barrier.cu's form_adat() method within the barrier solver, the raft::copy from d_original_A_values to device_AD.x is necessary and must not be removed. This copy restores the original matrix A values before they are scaled by the diagonal matrix D (via d_inv_diag_prime). Since D changes between iterations and the scaling mutates device_AD.x, copying the original values each iteration is required for correctness.
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify error propagation from CUDA to user-facing APIs is complete; ensure CUDA errors are caught and mapped to meaningful user error codes
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Applied to files:
cpp/src/dual_simplex/barrier.cu
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.12, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.13, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.12, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.13, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.10, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.11, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.11, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.10, arm64, rockylinux8
🔇 Additional comments (5)
cpp/src/dual_simplex/barrier.cu (5)
48-87: LGTM: Consistent use ofstream.value()for wrapper functions.These non-template wrappers correctly pass the raw
cudaStream_tvia.value()tocub::DeviceTransform::Transform, addressing the CCCL 3.2 API compatibility issue.
1371-1376: LGTM: Correctly updated stream argument.
2052-2106: LGTM: All transform calls consistently updated ingpu_compute_residuals.
2345-2382: LGTM: Diagonal computation transforms correctly updated.
3001-3015: LGTM: Affine RHS computation transforms correctly updated.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
/merge |
This PR reverts the change to non blocking stream and fixes a silent regression following an API change in CCCL 3.2
Closes: #741
Summary by CodeRabbit
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.