Skip to content

Conversation

@hlinsen
Copy link
Contributor

@hlinsen hlinsen commented Jan 13, 2026

This PR reverts the change to non blocking stream and fixes a silent regression following an API change in CCCL 3.2

Closes: #741

Summary by CodeRabbit

  • Refactor

    • Standardized GPU stream handling across solvers by switching to raw stream handles for device operations.
    • Introduced per-call stream scoping, added targeted synchronizations, and reset buffer streams to ensure correct stream lifetime and execution ordering during solves.
  • Chores

    • Updated copyright years to 2026.

✏️ Tip: You can customize this high-level summary in your review settings.

@hlinsen hlinsen requested a review from a team as a code owner January 13, 2026 23:55
@hlinsen hlinsen added the bug Something isn't working label Jan 13, 2026
@hlinsen hlinsen added the non-breaking Introduces a non-breaking change label Jan 13, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 14, 2026

📝 Walkthrough

Walkthrough

Replaced many uses of stream_view_ with stream_view_.value() when passing CUDA streams to device operations across LP and dual-simplex sources; updated several file header years to 2026; added per-call scoped rmm::cuda_stream / raft handle plus buffer stream resets and synchronizations in cpp/src/linear_programming/utilities/cython_solve.cu.

Changes

Cohort / File(s) Summary
Dual simplex core
cpp/src/dual_simplex/barrier.cu
Replaced stream_view_ with stream_view_.value() for CUDA device calls (kernels, cub/thrust transforms/selects).
Initial scaling
cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
Switched cub::DeviceTransform::Transform stream argument from stream_view_ to stream_view_.value().
PDHG & PDLP solvers
cpp/src/linear_programming/pdhg.cu, cpp/src/linear_programming/pdlp.cu
Replaced stream_view_ with stream_view_.value() across CUDA graph/transform invocations; minor header year updates.
Restart & weighted average
cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu, cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
Updated execution stream arguments to use stream_view_.value(); header year bumps.
Step size & termination
cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu, cpp/src/linear_programming/termination_strategy/convergence_information.cu, cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
Converted stream_view_ usages to stream_view_.value() for cub/thrust transforms/reduces and kernel launches; header year updates.
Python solver interface
cpp/src/linear_programming/utilities/cython_solve.cu
Added per-call rmm::cuda_stream and scoped raft handle; moved solver object lifetime to respect stream scope; reset device-buffer streams to per-thread default and added warm-start synchronization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: explicitly using cudaStream_t for cub::DeviceTransform calls and implementing non-blocking stream usage, which directly addresses the regression and synchronization issues.
Linked Issues check ✅ Passed The PR successfully addresses issue #741 by fixing the CCCL 3.2 regression (stream argument type changes across all CUDA operations) and implementing proper non-blocking stream handling with correct resource lifetime management in cython_solve.cu.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the PDLP synchronization issue: updating stream argument types for compatibility and managing stream lifetimes. Minor copyright year updates are standard maintenance changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings


📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b859963 and e740191.

📒 Files selected for processing (1)
  • cpp/src/dual_simplex/barrier.cu
🚧 Files skipped from review as they are similar to previous changes (1)
  • cpp/src/dual_simplex/barrier.cu

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cpp/src/linear_programming/termination_strategy/infeasibility_information.cu (1)

388-393: Inconsistent stream parameter: This Transform call should also use stream_view_.value().

This cub::DeviceTransform::Transform call still passes stream_view_ directly (line 393), while the earlier Transform call at line 361 was updated to use stream_view_.value(). For consistency and to fully address the CCCL 3.2 API change, this call should be updated as well.

🔧 Proposed fix
   cub::DeviceTransform::Transform(
     cuda::std::make_tuple(reduced_cost_.data(), problem_ptr->variable_bounds.data()),
     bound_value_.data(),
     primal_size_h_,
     bound_value_reduced_cost_product<f_t, f_t2>(),
-    stream_view_);
+    stream_view_.value());
cpp/src/dual_simplex/barrier.cu (1)

3138-3150: Update cub::DeviceTransform::Transform calls for consistency with other stream parameter usage in the file.

Lines 3143 and 3150 use stream_view_ directly, while most other cub::DeviceTransform::Transform calls in the file use stream_view_.value(). Additionally, compute_final_direction (lines 3213, 3223, 3230) and compute_next_iterate (lines 3292, 3302, 3309) have the same inconsistency. For consistency with the predominant pattern across the file, either all cub::DeviceTransform::Transform calls should use .value() or all should use the bare stream_view_ with implicit conversion. Choose one approach and apply it uniformly.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1e7703 and bff475c.

📒 Files selected for processing (10)
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cu,cuh}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh}: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
**/*.cu

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.cu: Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths

Files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
**/*.{cu,cuh,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...

Files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
**/*.{cu,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
🧠 Learnings (18)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions

Applied to files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication

Applied to files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution

Applied to files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle

Applied to files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events

Applied to files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-12-04T20:09:09.264Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.

Applied to files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)

Applied to files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state

Applied to files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations

Applied to files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify error propagation from CUDA to user-facing APIs is complete; ensure CUDA errors are caught and mapped to meaningful user error codes

Applied to files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cpp,hpp,h} : Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Applied to files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment

Applied to files:

  • cpp/src/linear_programming/termination_strategy/infeasibility_information.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Applied to files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations

Applied to files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/linear_programming/pdhg.cu
  • cpp/src/linear_programming/termination_strategy/convergence_information.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems

Applied to files:

  • cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results

Applied to files:

  • cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu
🧬 Code graph analysis (1)
cpp/src/dual_simplex/barrier.cu (1)
cpp/src/linear_programming/pdlp.cu (1)
  • d_z (1486-1486)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: wheel-build-cuopt-sh-client / 13.1.0, 3.10, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.13, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.11, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.12, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.11, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.10, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.12, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.10, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.13, amd64, rockylinux8
  • GitHub Check: checks / check-style
🔇 Additional comments (31)
cpp/src/linear_programming/restart_strategy/weighted_average_solution.cu (1)

82-89: LGTM: Stream parameter fix for CCCL 3.2 compatibility.

The change from stream_view_ to stream_view_.value() correctly extracts the raw cudaStream_t for cub::DeviceTransform::Transform, addressing the API change in CCCL 3.2. Both Transform calls are updated consistently.

cpp/src/linear_programming/pdhg.cu (4)

129-138: LGTM: Consistent stream parameter update for dual projection.

The stream_view_.value() change correctly provides the raw cudaStream_t to cub::DeviceTransform::Transform for CCCL 3.2 compatibility.


187-197: LGTM: Stream parameter correctly updated for primal projection.


337-364: LGTM: Both reflected projection Transform calls updated consistently.

The should_major branch correctly uses stream_view_.value() for both primal and dual reflected projections in compute_next_primal_dual_solution_reflected.


381-405: LGTM: Non-major iteration branch also updated consistently.

Both Transform calls in the else branch are correctly updated to use stream_view_.value().

cpp/src/linear_programming/restart_strategy/pdlp_restart_strategy.cu (2)

1680-1685: LGTM: Stream parameter updated for bounds extraction transform.

The change correctly uses stream_view_.value() for the cub::DeviceTransform::Transform call that extracts variable bounds.


1863-1869: LGTM: Stream parameter updated for clamping transform.

The stream_view_.value() change for the clamp transform is consistent with the fix pattern.

cpp/src/linear_programming/step_size_strategy/adaptive_step_size_strategy.cu (1)

301-307: LGTM: Stream parameter fix for interaction computation.

The stream_view_.value() change for the subtraction transform in compute_interaction_and_movement correctly addresses CCCL 3.2 compatibility.

cpp/src/linear_programming/termination_strategy/infeasibility_information.cu (1)

356-361: LGTM: Stream parameter updated for bound value gradient transform.

The change to stream_view_.value() correctly addresses CCCL 3.2 compatibility.

cpp/src/linear_programming/initial_scaling_strategy/initial_scaling.cu (1)

433-489: LGTM! Correct fix for CCCL 3.2 API compatibility.

The changes correctly extract the underlying cudaStream_t via .value() for the cub::DeviceTransform::Transform calls. This aligns with the CCCL 3.2 API change that now explicitly requires cudaStream_t rather than accepting stream wrapper objects. The other CUDA operations (kernel launches, cudaMemsetAsync, thrust, raft operations) appropriately continue using stream_view_ directly since they accept rmm::cuda_stream_view.

cpp/src/linear_programming/utilities/cython_solve.cu (1)

232-233: LGTM! Proper non-blocking stream initialization.

The change correctly creates a non-blocking CUDA stream and passes it to the raft handle. This is important for:

  1. Avoiding implicit synchronization with the default stream in concurrent scenarios (batch solve at line 307)
  2. Proper stream lifecycle management - the rmm::cuda_stream owns the underlying cudaStream_t and will destroy it when going out of scope, after the handle

As per coding guidelines, explicitly creating and managing dedicated streams for concurrent operations is the preferred approach.

cpp/src/linear_programming/pdlp.cu (4)

1032-1043: LGTM! Delta computation stream parameters fixed.

The cub::DeviceTransform::Transform calls for computing delta primal and delta dual correctly use stream_view_.value() to extract the underlying cudaStream_t, addressing the CCCL 3.2 API change.


1137-1150: LGTM! Initial primal projection stream parameters fixed.

The clamp operations for projecting initial primal solutions correctly pass the underlying stream handle.


1393-1420: LGTM! Halpern update stream parameters fixed.

Both primal and dual update transforms in the Halpern iteration correctly use stream_view_.value().


1525-1574: LGTM! Initial step size computation stream parameters fixed.

The normalization and residual computation transforms correctly extract the underlying stream handle.

cpp/src/linear_programming/termination_strategy/convergence_information.cu (4)

268-281: LGTM! Primal residual computation stream parameter fixed.

The cub::DeviceTransform::Transform call for computing primal residual and primal slack correctly uses stream_view_.value().


359-364: LGTM! Dual residual computation stream parameter fixed.

The transform for computing dual residual correctly passes the underlying stream handle.


457-462: LGTM! Reduced cost computation stream parameter fixed.

The bound value gradient transform correctly uses stream_view_.value().


490-503: Both DeviceTransform::Transform and DeviceReduce::Sum correctly accept stream_view_ due to implicit conversion from rmm::cuda_stream_view to cudaStream_t. The inconsistency in style (one using .value() explicitly, the other passing the wrapper directly) is not a functional issue. Both patterns are correct and work interchangeably.

cpp/src/dual_simplex/barrier.cu (12)

48-60: LGTM!

The pairwise_multiply helper functions correctly use stream.value() to extract the raw cudaStream_t for cub::DeviceTransform::Transform, addressing the CCCL 3.2 API change.


62-87: LGTM!

The axpy helper functions correctly use stream.value() for the cub::DeviceTransform::Transform calls.


1369-1377: LGTM!

The gpu_adat_multiply method correctly uses stream_view_.value() for the element-wise multiply transform.


2050-2103: LGTM!

All cub::DeviceTransform::Transform calls in gpu_compute_residuals are consistently updated to use stream_view_.value().


2343-2379: LGTM!

The diagonal, inverse diagonal, and sqrt inverse diagonal formation correctly uses stream_view_.value() for all cub::DeviceTransform::Transform calls.


2430-2457: LGTM!

The GPU compute H block correctly uses stream_view_.value() for the transform operations.


2786-2804: LGTM!

The dz formation correctly uses stream_view_.value().


2828-2847: LGTM!

The dv formation correctly uses stream_view_.value().


2914-2950: LGTM!

The dw formation and residual computations correctly use stream_view_.value().


2998-3021: LGTM!

The compute_affine_rhs method correctly uses stream_view_.value() for negating the complementarity residuals.


3205-3230: Inconsistent stream parameter usage for cub::DeviceTransform::Transform.

These three cub::DeviceTransform::Transform calls at lines 3213, 3223, and 3230 use stream_view_ without .value(), which is inconsistent with the other updated calls in this PR.

🔧 Suggested fix for consistency
     cub::DeviceTransform::Transform(
       cuda::std::make_tuple(
         data.d_dw_aff_.data(), data.d_dv_aff_.data(), data.d_dw_.data(), data.d_dv_.data()),
       thrust::make_zip_iterator(data.d_dw_.data(), data.d_dv_.data()),
       data.d_dw_.size(),
       [] HD(f_t dw_aff, f_t dv_aff, f_t dw, f_t dv) -> thrust::tuple<f_t, f_t> {
         return {dw + dw_aff, dv + dv_aff};
       },
-      stream_view_);
+      stream_view_.value());
 
     cub::DeviceTransform::Transform(
       cuda::std::make_tuple(
         data.d_dx_aff_.data(), data.d_dz_aff_.data(), data.d_dx_.data(), data.d_dz_.data()),
       thrust::make_zip_iterator(data.d_dx_.data(), data.d_dz_.data()),
       data.d_dx_.size(),
       [] HD(f_t dx_aff, f_t dz_aff, f_t dx, f_t dz) -> thrust::tuple<f_t, f_t> {
         return {dx + dx_aff, dz + dz_aff};
       },
-      stream_view_);
+      stream_view_.value());
 
     cub::DeviceTransform::Transform(
       cuda::std::make_tuple(data.d_dy_aff_.data(), data.d_dy_.data()),
       data.d_dy_.data(),
       data.d_dy_.size(),
       [] HD(f_t dy_aff, f_t dy) { return dy + dy_aff; },
-      stream_view_);
+      stream_view_.value());

3284-3309: Inconsistent stream parameter usage for cub::DeviceTransform::Transform.

These three cub::DeviceTransform::Transform calls at lines 3292, 3302, and 3309 use stream_view_ without .value(), which is inconsistent with the other updated calls in this PR.

🔧 Suggested fix for consistency
     cub::DeviceTransform::Transform(
       cuda::std::make_tuple(
         data.d_w_.data(), data.d_v_.data(), data.d_dw_.data(), data.d_dv_.data()),
       thrust::make_zip_iterator(data.d_w_.data(), data.d_v_.data()),
       data.d_dw_.size(),
       [step_primal, step_dual] HD(f_t w, f_t v, f_t dw, f_t dv) -> thrust::tuple<f_t, f_t> {
         return {w + step_primal * dw, v + step_dual * dv};
       },
-      stream_view_);
+      stream_view_.value());
 
     cub::DeviceTransform::Transform(
       cuda::std::make_tuple(
         data.d_x_.data(), data.d_z_.data(), data.d_dx_.data(), data.d_dz_.data()),
       thrust::make_zip_iterator(data.d_x_.data(), data.d_z_.data()),
       data.d_dx_.size(),
       [step_primal, step_dual] HD(f_t x, f_t z, f_t dx, f_t dz) -> thrust::tuple<f_t, f_t> {
         return {x + step_primal * dx, z + step_dual * dz};
       },
-      stream_view_);
+      stream_view_.value());
 
     cub::DeviceTransform::Transform(
       cuda::std::make_tuple(data.d_y_.data(), data.d_dy_.data()),
       data.d_y_.data(),
       data.d_y_.size(),
       [step_dual] HD(f_t y, f_t dy) { return y + step_dual * dy; },
-      stream_view_);
+      stream_view_.value());

@hlinsen hlinsen force-pushed the fix-device-transform branch from bff475c to 2f88100 Compare January 14, 2026 02:36
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/linear_programming/utilities/cython_solve.cu (1)

232-247: CUDA stream resource leak: stream created but never destroyed.

The cudaStream_t created at line 233 is never destroyed with cudaStreamDestroy(). When raft::handle_t is constructed with an external stream, it does not take ownership—the caller remains responsible for cleanup. This leaks a stream on every call to call_solve, which is especially problematic in batch mode where multiple parallel calls accumulate leaked streams.

Based on learnings, ensure cleanup of streams/events.

🔧 Proposed fix: Add stream synchronization and cleanup
 std::unique_ptr<solver_ret_t> call_solve(
   cuopt::mps_parser::data_model_view_t<int, double>* data_model,
   cuopt::linear_programming::solver_settings_t<int, double>* solver_settings,
   unsigned int flags,
   bool is_batch_mode)
 {
   raft::common::nvtx::range fun_scope("Call Solve");
   cudaStream_t stream;
   RAFT_CUDA_TRY(cudaStreamCreateWithFlags(&stream, flags));
   const raft::handle_t handle_{stream};

   auto op_problem = data_model_to_optimization_problem(data_model, solver_settings, &handle_);
   solver_ret_t response;
   if (op_problem.get_problem_category() == linear_programming::problem_category_t::LP) {
     response.lp_ret =
       call_solve_lp(op_problem, solver_settings->get_pdlp_settings(), is_batch_mode);
     response.problem_type = linear_programming::problem_category_t::LP;
   } else {
     response.mip_ret      = call_solve_mip(op_problem, solver_settings->get_mip_settings());
     response.problem_type = linear_programming::problem_category_t::MIP;
   }

+  RAFT_CUDA_TRY(cudaStreamSynchronize(stream));
+  RAFT_CUDA_TRY(cudaStreamDestroy(stream));
+
   return std::make_unique<solver_ret_t>(std::move(response));
 }

Alternatively, consider using a RAII wrapper for automatic cleanup to ensure exception safety.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bff475c and 2f88100.

📒 Files selected for processing (1)
  • cpp/src/linear_programming/utilities/cython_solve.cu
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cu,cuh}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh}: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
**/*.cu

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.cu: Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths

Files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
**/*.{cu,cuh,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...

Files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
**/*.{cu,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle

Applied to files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution

Applied to files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication

Applied to files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2026-01-14T00:38:33.700Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 746
File: cpp/src/dual_simplex/barrier.cu:549-552
Timestamp: 2026-01-14T00:38:33.700Z
Learning: In cpp/src/dual_simplex/barrier.cu's form_adat() method within the barrier solver, the raft::copy from d_original_A_values to device_AD.x is necessary and must not be removed. This copy restores the original matrix A values before they are scaled by the diagonal matrix D (via d_inv_diag_prime). Since D changes between iterations and the scaling mutates device_AD.x, copying the original values each iteration is required for correctness.

Applied to files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations

Applied to files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Applied to files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-12-04T20:09:09.264Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.

Applied to files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
🧬 Code graph analysis (1)
cpp/src/linear_programming/utilities/cython_solve.cu (1)
cpp/tests/distance_engine/waypoint_matrix_test.cpp (4)
  • stream (49-87)
  • stream (89-97)
  • stream (136-161)
  • stream (197-214)

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@hlinsen hlinsen added this to the 26.02 milestone Jan 14, 2026
@hlinsen hlinsen force-pushed the fix-device-transform branch from efa2ac0 to 9293aed Compare January 14, 2026 05:24
Comment on lines +244 to +257
response.lp_ret.primal_solution_->set_stream(rmm::cuda_stream_per_thread);
response.lp_ret.dual_solution_->set_stream(rmm::cuda_stream_per_thread);
response.lp_ret.reduced_cost_->set_stream(rmm::cuda_stream_per_thread);
response.lp_ret.current_primal_solution_->set_stream(rmm::cuda_stream_per_thread);
response.lp_ret.current_dual_solution_->set_stream(rmm::cuda_stream_per_thread);
response.lp_ret.initial_primal_average_->set_stream(rmm::cuda_stream_per_thread);
response.lp_ret.initial_dual_average_->set_stream(rmm::cuda_stream_per_thread);
response.lp_ret.current_ATY_->set_stream(rmm::cuda_stream_per_thread);
response.lp_ret.sum_primal_solutions_->set_stream(rmm::cuda_stream_per_thread);
response.lp_ret.sum_dual_solutions_->set_stream(rmm::cuda_stream_per_thread);
response.lp_ret.last_restart_duality_gap_primal_solution_->set_stream(
rmm::cuda_stream_per_thread);
response.lp_ret.last_restart_duality_gap_dual_solution_->set_stream(
rmm::cuda_stream_per_thread);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll have to be careful not to forget to set the streams here in cython_solve, if we made any additions to the PDLP structures. However I cannot think of a way to do this programmatically and cleanly.... Maybe coderabbit can remember this as a rule somehow and help us there

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
cpp/src/dual_simplex/barrier.cu (3)

3140-3152: Missing .value() calls in compute_cc_rhs - inconsistent with PR objective.

These cub::DeviceTransform::Transform calls still use stream_view_ directly instead of stream_view_.value(), which is inconsistent with all other changes in this PR. If CCCL 3.2 requires the raw cudaStream_t, these calls will exhibit the same regression.

🐛 Proposed fix
     cub::DeviceTransform::Transform(
       cuda::std::make_tuple(data.d_dx_aff_.data(), data.d_dz_aff_.data()),
       data.d_complementarity_xz_rhs_.data(),
       data.d_complementarity_xz_rhs_.size(),
       [new_mu] HD(f_t dx_aff, f_t dz_aff) { return -(dx_aff * dz_aff) + new_mu; },
-      stream_view_);
+      stream_view_.value());

     cub::DeviceTransform::Transform(
       cuda::std::make_tuple(data.d_dw_aff_.data(), data.d_dv_aff_.data()),
       data.d_complementarity_wv_rhs_.data(),
       data.d_complementarity_wv_rhs_.size(),
       [new_mu] HD(f_t dw_aff, f_t dv_aff) { return -(dw_aff * dv_aff) + new_mu; },
-      stream_view_);
+      stream_view_.value());

3207-3232: Missing .value() calls in compute_final_direction - inconsistent with PR objective.

These three cub::DeviceTransform::Transform calls still pass stream_view_ directly instead of stream_view_.value().

🐛 Proposed fix
     cub::DeviceTransform::Transform(
       cuda::std::make_tuple(
         data.d_dw_aff_.data(), data.d_dv_aff_.data(), data.d_dw_.data(), data.d_dv_.data()),
       thrust::make_zip_iterator(data.d_dw_.data(), data.d_dv_.data()),
       data.d_dw_.size(),
       [] HD(f_t dw_aff, f_t dv_aff, f_t dw, f_t dv) -> thrust::tuple<f_t, f_t> {
         return {dw + dw_aff, dv + dv_aff};
       },
-      stream_view_);
+      stream_view_.value());

     cub::DeviceTransform::Transform(
       cuda::std::make_tuple(
         data.d_dx_aff_.data(), data.d_dz_aff_.data(), data.d_dx_.data(), data.d_dz_.data()),
       thrust::make_zip_iterator(data.d_dx_.data(), data.d_dz_.data()),
       data.d_dx_.size(),
       [] HD(f_t dx_aff, f_t dz_aff, f_t dx, f_t dz) -> thrust::tuple<f_t, f_t> {
         return {dx + dx_aff, dz + dz_aff};
       },
-      stream_view_);
+      stream_view_.value());

     cub::DeviceTransform::Transform(
       cuda::std::make_tuple(data.d_dy_aff_.data(), data.d_dy_.data()),
       data.d_dy_.data(),
       data.d_dy_.size(),
       [] HD(f_t dy_aff, f_t dy) { return dy + dy_aff; },
-      stream_view_);
+      stream_view_.value());

3286-3311: Missing .value() calls in compute_next_iterate - inconsistent with PR objective.

These three cub::DeviceTransform::Transform calls still pass stream_view_ directly instead of stream_view_.value().

🐛 Proposed fix
     cub::DeviceTransform::Transform(
       cuda::std::make_tuple(
         data.d_w_.data(), data.d_v_.data(), data.d_dw_.data(), data.d_dv_.data()),
       thrust::make_zip_iterator(data.d_w_.data(), data.d_v_.data()),
       data.d_dw_.size(),
       [step_primal, step_dual] HD(f_t w, f_t v, f_t dw, f_t dv) -> thrust::tuple<f_t, f_t> {
         return {w + step_primal * dw, v + step_dual * dv};
       },
-      stream_view_);
+      stream_view_.value());

     cub::DeviceTransform::Transform(
       cuda::std::make_tuple(
         data.d_x_.data(), data.d_z_.data(), data.d_dx_.data(), data.d_dz_.data()),
       thrust::make_zip_iterator(data.d_x_.data(), data.d_z_.data()),
       data.d_dx_.size(),
       [step_primal, step_dual] HD(f_t x, f_t z, f_t dx, f_t dz) -> thrust::tuple<f_t, f_t> {
         return {x + step_primal * dx, z + step_dual * dz};
       },
-      stream_view_);
+      stream_view_.value());

     cub::DeviceTransform::Transform(
       cuda::std::make_tuple(data.d_y_.data(), data.d_dy_.data()),
       data.d_y_.data(),
       data.d_y_.size(),
       [step_dual] HD(f_t y, f_t dy) { return y + step_dual * dy; },
-      stream_view_);
+      stream_view_.value());
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9293aed and b859963.

📒 Files selected for processing (1)
  • cpp/src/dual_simplex/barrier.cu
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cu,cuh}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh}: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Files:

  • cpp/src/dual_simplex/barrier.cu
**/*.cu

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.cu: Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths

Files:

  • cpp/src/dual_simplex/barrier.cu
**/*.{cu,cuh,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...

Files:

  • cpp/src/dual_simplex/barrier.cu
**/*.{cu,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Files:

  • cpp/src/dual_simplex/barrier.cu
🧠 Learnings (11)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 746
File: cpp/src/dual_simplex/barrier.cu:549-552
Timestamp: 2026-01-14T00:38:38.038Z
Learning: In cpp/src/dual_simplex/barrier.cu's form_adat() method within the barrier solver, the raft::copy from d_original_A_values to device_AD.x is necessary and must not be removed. This copy restores the original matrix A values before they are scaled by the diagonal matrix D (via d_inv_diag_prime). Since D changes between iterations and the scaling mutates device_AD.x, copying the original values each iteration is required for correctness.
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2026-01-14T00:38:38.038Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 746
File: cpp/src/dual_simplex/barrier.cu:549-552
Timestamp: 2026-01-14T00:38:38.038Z
Learning: In cpp/src/dual_simplex/barrier.cu's form_adat() method within the barrier solver, the raft::copy from d_original_A_values to device_AD.x is necessary and must not be removed. This copy restores the original matrix A values before they are scaled by the diagonal matrix D (via d_inv_diag_prime). Since D changes between iterations and the scaling mutates device_AD.x, copying the original values each iteration is required for correctness.

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify error propagation from CUDA to user-facing APIs is complete; ensure CUDA errors are caught and mapped to meaningful user error codes

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.12, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.13, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.12, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.13, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.10, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.11, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.11, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.1.0, 3.10, arm64, rockylinux8
🔇 Additional comments (5)
cpp/src/dual_simplex/barrier.cu (5)

48-87: LGTM: Consistent use of stream.value() for wrapper functions.

These non-template wrappers correctly pass the raw cudaStream_t via .value() to cub::DeviceTransform::Transform, addressing the CCCL 3.2 API compatibility issue.


1371-1376: LGTM: Correctly updated stream argument.


2052-2106: LGTM: All transform calls consistently updated in gpu_compute_residuals.


2345-2382: LGTM: Diagonal computation transforms correctly updated.


3001-3015: LGTM: Affine RHS computation transforms correctly updated.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@hlinsen
Copy link
Contributor Author

hlinsen commented Jan 14, 2026

/merge

@rapids-bot rapids-bot bot merged commit f0d1ef7 into NVIDIA:main Jan 14, 2026
195 of 197 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] PDLP synchronization issue when triggering from python

2 participants