Add support for solving uniform batches with CUDSS #78

klamike · 2025-12-17T05:44:17Z

It is still using the last release of MadNLP, but it will be easy to rebase on #76 upon next release.

The main idea is:

Each batch sample's KKT system has aug_com.nzVal pointing to a slice of the batch solver's tril.nzVal. This lets us re-use all the KKT building machinery.
Immediately prior to solving, copy each sample's RHS vector (e.g. primal_dual(::UnreducedKKTVector)) to a slice of a dedicated buffer. After the batch solve, copy the results back to each sample's UnreducedKKTVector.
To work around CUDSS not supporting sub-batch solves, I keep reducing the batch size and shifting the nzVal/RHS pointers when samples terminate.

Remove broadcasting.
Test/profile on a real problem (I will PR to ExaModelsPower a DCOPF builder)
Consider how to integrate with Batch API JuliaSmoothOptimizers/NLPModels.jl#521.
Implement a constructor for Vector{MPCSolver} that shares buffers
Add support for the other AbstractSparseKKTSystem

Copilot

Pull request overview

This PR adds support for solving uniform batches of quadratic programming problems using CUDSS (CUDA Direct Solver System). The implementation enables parallel solving of multiple QP problems with identical sparsity patterns on GPUs, which can provide significant performance benefits for applications like Model Predictive Control.

Key changes:

Refactored the solver initialization and main loop into modular functions to support both single and batch solving modes
Implemented a new UniformBatch extension providing batch-specific KKT system handling and solver coordination
Added custom broadcasting infrastructure for efficient iteration over active solvers in the batch

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
test/test_gpu.jl	Added comprehensive test for CUDSS uniform batch solving, verifying results match individual solver runs
test/runtests.jl	Modified simple_lp helper to accept configurable Avals parameter and changed exact equality to approximate equality for robustness
src/solver.jl	Refactored initialization into pre_initialize!, init_starting_point_solve!, and post_initialize! for batch compatibility; extracted helper functions for key operations
src/linear_solver.jl	Added build_kkt! wrapper and extracted post_solve! logic from solve_system! for better modularity
ext/MadIPMCUDAExt/UniformBatch/structure.jl	Implemented UniformBatchSolver structure to manage multiple solver instances with shared batch KKT system
ext/MadIPMCUDAExt/UniformBatch/solver.jl	Implemented batch versions of initialization, factorization, and MPC algorithm with selective solver activation
ext/MadIPMCUDAExt/UniformBatch/kkt.jl	Implemented UniformBatchKKTSystem for managing batched matrix factorizations and solves with CUDSS
ext/MadIPMCUDAExt/UniformBatch/broadcast.jl	Added custom broadcasting to efficiently iterate over active solvers in batch
ext/MadIPMCUDAExt/UniformBatch/UniformBatch.jl	Module entry point defining helper functions for batch solving with reduced KKT systems
ext/MadIPMCUDAExt/MadIPMCUDAExt.jl	Added include statement for UniformBatch module
Project.toml	Relaxed MadNLPGPU version constraint from 0.7.15 to 0.7

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ext/MadIPMCUDAExt/UniformBatch/structure.jl

src/solver.jl

ext/MadIPMCUDAExt/UniformBatch/structure.jl

src/solver.jl

ext/MadIPMCUDAExt/UniformBatch/solver.jl

src/solver.jl

ext/MadIPMCUDAExt/UniformBatch/broadcast.jl

ext/MadIPMCUDAExt/UniformBatch/structure.jl

ext/MadIPMCUDAExt/UniformBatch/kkt.jl

Co-authored-by: Andrew Rosemberg <[email protected]>

ext/MadIPMCUDAExt/UniformBatch/kkt.jl

ext/MadIPMCUDAExt/UniformBatch/solver.jl

andrewrosemberg · 2025-12-17T18:19:13Z

LGTM, I can recheck after the fixme and todo are done, but these seem straightforward.

DCOPF test case will be nice to confirm on a larger case.

klamike · 2025-12-18T16:15:31Z

Here are the benchmark results, from running https://github.com/klamike/MadIPM.jl/blob/mk/batch_profile/test.jl (note it is tol=1e-4 and cudss_ir=3)

                Starting 89_pegase

[ Info: Loading matpower file
┌ Info: 89_pegase x 4 -- Batch is 1.68x faster
│   t_loop = 0.292890551
│   t_batch = 0.174158794
│   t_loop - t_batch = 0.118731757
│   t_loop / batch_size = 0.07322263775
└   t_batch / batch_size = 0.0435396985
┌ Info: 89_pegase x 16 -- Batch is 1.41x faster
│   t_loop = 1.065516498
│   t_batch = 0.755904324
│   t_loop - t_batch = 0.3096121740000001
│   t_loop / batch_size = 0.066594781125
└   t_batch / batch_size = 0.04724402025
┌ Info: 89_pegase x 64 -- Batch is 1.78x faster
│   t_loop = 4.670513256
│   t_batch = 2.619187511
│   t_loop - t_batch = 2.0513257449999998
│   t_loop / batch_size = 0.072976769625
└   t_batch / batch_size = 0.040924804859375


                Starting 1354_pegase

[ Info: Loading matpower file
┌ Info: 1354_pegase x 4 -- Batch is 1.93x faster
│   t_loop = 0.619273734
│   t_batch = 0.320906251
│   t_loop - t_batch = 0.298367483
│   t_loop / batch_size = 0.1548184335
└   t_batch / batch_size = 0.08022656275
┌ Info: 1354_pegase x 16 -- Batch is 2.32x faster
│   t_loop = 2.469379521
│   t_batch = 1.063210762
│   t_loop - t_batch = 1.406168759
│   t_loop / batch_size = 0.1543362200625
└   t_batch / batch_size = 0.066450672625
┌ Info: 1354_pegase x 64 -- Batch is 2.16x faster
│   t_loop = 10.421725742
│   t_batch = 4.823384208
│   t_loop - t_batch = 5.598341533999999
│   t_loop / batch_size = 0.16283946471875
└   t_batch / batch_size = 0.07536537825


                Starting 2869_pegase

[ Info: Loading matpower file
┌ Info: 2869_pegase x 4 -- Batch is 2.25x faster
│   t_loop = 0.958210774
│   t_batch = 0.425290291
│   t_loop - t_batch = 0.5329204830000001
│   t_loop / batch_size = 0.2395526935
└   t_batch / batch_size = 0.10632257275
┌ Info: 2869_pegase x 16 -- Batch is 3.06x faster
│   t_loop = 3.95141398
│   t_batch = 1.29147746
│   t_loop - t_batch = 2.6599365199999996
│   t_loop / batch_size = 0.24696337375
└   t_batch / batch_size = 0.08071734125
┌ Info: 2869_pegase x 64 -- Batch is 3.15x faster
│   t_loop = 16.175443389
│   t_batch = 5.13312229
│   t_loop - t_batch = 11.042321099000002
│   t_loop / batch_size = 0.252741302953125
└   t_batch / batch_size = 0.08020503578125


                Starting 6470_rte

[ Info: Loading matpower file
┌ Info: 6470_rte x 4 -- Batch is 1.7x faster
│   t_loop = 1.65961581
│   t_batch = 0.978801008
│   t_loop - t_batch = 0.680814802
│   t_loop / batch_size = 0.4149039525
└   t_batch / batch_size = 0.244700252
┌ Info: 6470_rte x 16 -- Batch is 4.13x faster
│   t_loop = 6.732655854
│   t_batch = 1.629377134
│   t_loop - t_batch = 5.10327872
│   t_loop / batch_size = 0.420790990875
└   t_batch / batch_size = 0.101836070875
┌ Info: 6470_rte x 64 -- Batch is 4.91x faster
│   t_loop = 27.903837811
│   t_batch = 5.677465911
│   t_loop - t_batch = 22.2263719
│   t_loop / batch_size = 0.435997465796875
└   t_batch / batch_size = 0.088710404859375

                Starting 9241_pegase

[ Info: Loading matpower file
┌ Info: 9241_pegase x 4 -- Batch is 1.06x faster
│   t_loop = 2.934963509
│   t_batch = 2.772024485
│   t_loop - t_batch = 0.16293902399999993
│   t_loop / batch_size = 0.73374087725
└   t_batch / batch_size = 0.69300612125
┌ Info: 9241_pegase x 16 -- Batch is 4.73x faster
│   t_loop = 11.660498002
│   t_batch = 2.465172773
│   t_loop - t_batch = 9.195325229000002
│   t_loop / batch_size = 0.728781125125
└   t_batch / batch_size = 0.1540732983125
┌ Info: 9241_pegase x 64 -- Batch is 5.1x faster
│   t_loop = 48.443547009
│   t_batch = 9.497881699
│   t_loop - t_batch = 38.945665309999995
│   t_loop / batch_size = 0.756930422015625
└   t_batch / batch_size = 0.148404401546875

frapac

This look good to me! The implementation of the batch solver is very nice, and does not touch a lot to the internals in MadIPM. I like the direction this project is taking. In the long term, I would be interesting to know how much do we lose by not storing our vector contiguously in memory along the different batches.

I only have minor comments so far. This PR can be merged as soon they are addressed, so we can move on this.

ext/MadIPMCUDAExt/UniformBatch/UniformBatch.jl

ext/MadIPMCUDAExt/UniformBatch/kkt.jl

ext/MadIPMCUDAExt/UniformBatch/solver.jl

frapac · 2025-12-18T16:43:41Z

ext/MadIPMCUDAExt/UniformBatch/solver.jl

+    while true
+        # Check termination criteria
+        for_active(batch_solver,
+            MadNLP.print_iter,


Do we print the iter for all batches? Isn't the output a bit messy as a result?

Yes, it is quite messy. I just didn't want to think about it yet 😉

ext/MadIPMCUDAExt/UniformBatch/solver.jl

ext/MadIPMCUDAExt/UniformBatch/structure.jl

frapac · 2025-12-18T16:49:49Z

src/utils.jl

    # Print remaning options (unsupported)
    if !isempty(remaining_options)
-        MadNLP.print_ignored_options(logger, remaining_options)
+        # MadNLP.print_ignored_options(logger, remaining_options)


Dead comment? We can remove this line if needed

This is due to NoLinearSolver not considering the args that are meant for the batch solver. I do think it's a useful feature to report ignored options, it would be better to refactor how options are handled in the BatchSolver constructor so we detect what is meant for the batch solver, and what is meant for the individual MPCSolver..

klamike · 2025-12-26T21:26:02Z

Merging this into the batch branch. There are still some todos from François' review, which I will address in subsequent PRs to batch.

amontoison · 2025-12-29T06:59:17Z

src/solver.jl

-    solver::MadNLP.AbstractMadNLPSolver;
-    kwargs...
-)
+function solve!(solver::MadNLP.AbstractMadNLPSolver)


@klamike Why did you drop the support for kwargs... and options?
For example, it could be nice to use iterative refinement on the fly.

I think there was a bug here, (at least some of?) these options don't actually get set or something. Need to go back and check the details more closely.

Note it is not "on the fly" since we re-initialize in solve!. But it would still be nice to allow re-solve with updated options.

amontoison

Good work @klamike 👍
I only have one comment with the keyword arguments of the function solve!.

klamike added 2 commits December 17, 2025 00:42

further refactor to prepare for batch solver

0385a8e

add UniformBatch

5f7c988

Copilot AI review requested due to automatic review settings December 17, 2025 05:44

Copilot started reviewing on behalf of klamike December 17, 2025 05:44 View session

Copilot AI reviewed Dec 17, 2025

View reviewed changes

remove broadcasting

2e08736

andrewrosemberg reviewed Dec 17, 2025

View reviewed changes

ext/MadIPMCUDAExt/UniformBatch/kkt.jl Show resolved Hide resolved

andrewrosemberg reviewed Dec 17, 2025

View reviewed changes

ext/MadIPMCUDAExt/UniformBatch/kkt.jl Show resolved Hide resolved

Update kkt.jl

d060430

Co-authored-by: Andrew Rosemberg <[email protected]>

andrewrosemberg reviewed Dec 17, 2025

View reviewed changes

ext/MadIPMCUDAExt/UniformBatch/kkt.jl Show resolved Hide resolved

andrewrosemberg reviewed Dec 17, 2025

View reviewed changes

ext/MadIPMCUDAExt/UniformBatch/solver.jl Show resolved Hide resolved

klamike added 3 commits December 18, 2025 05:16

nits

df9459a

pass solver options

0941e5c

update_jacl kernel

e25dd06

This was referenced Dec 18, 2025

Batch kernel tracker #79

Open

[WIP] Batched MadIPM klamike/MadIPM.jl#1

Closed

[WIP] Batched API klamike/MadNLP.jl#1

Closed

frapac approved these changes Dec 18, 2025

View reviewed changes

klamike changed the base branch from master to batch December 19, 2025 03:34

klamike merged commit 4f8a53a into MadNLP:batch Dec 26, 2025
5 checks passed

amontoison reviewed Dec 29, 2025

View reviewed changes

Add support for solving uniform batches with CUDSS #78

Add support for solving uniform batches with CUDSS #78

Uh oh!

Conversation

klamike commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrewrosemberg commented Dec 17, 2025

Uh oh!

klamike commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frapac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

frapac Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

klamike Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

frapac Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

klamike Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

klamike commented Dec 26, 2025

Uh oh!

Uh oh!

amontoison Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

klamike Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amontoison left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

klamike commented Dec 17, 2025 •

edited

Loading

klamike commented Dec 18, 2025 •

edited

Loading

klamike Dec 29, 2025 •

edited

Loading