Skip to content

Conversation

@klamike
Copy link
Collaborator

@klamike klamike commented Dec 17, 2025

@frapac @amontoison @andrewrosemberg

It is still using the last release of MadNLP, but it will be easy to rebase on #76 upon next release.

The main idea is:

  1. Each batch sample's KKT system has aug_com.nzVal pointing to a slice of the batch solver's tril.nzVal. This lets us re-use all the KKT building machinery.
  2. Immediately prior to solving, copy each sample's RHS vector (e.g. primal_dual(::UnreducedKKTVector)) to a slice of a dedicated buffer. After the batch solve, copy the results back to each sample's UnreducedKKTVector.
  3. To work around CUDSS not supporting sub-batch solves, I keep reducing the batch size and shifting the nzVal/RHS pointers when samples terminate.
  • Remove broadcasting.
  • Test/profile on a real problem (I will PR to ExaModelsPower a DCOPF builder)
  • Consider how to integrate with Batch API JuliaSmoothOptimizers/NLPModels.jl#521.
  • Implement a constructor for Vector{MPCSolver} that shares buffers
  • Add support for the other AbstractSparseKKTSystem

Copilot AI review requested due to automatic review settings December 17, 2025 05:44
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for solving uniform batches of quadratic programming problems using CUDSS (CUDA Direct Solver System). The implementation enables parallel solving of multiple QP problems with identical sparsity patterns on GPUs, which can provide significant performance benefits for applications like Model Predictive Control.

Key changes:

  • Refactored the solver initialization and main loop into modular functions to support both single and batch solving modes
  • Implemented a new UniformBatch extension providing batch-specific KKT system handling and solver coordination
  • Added custom broadcasting infrastructure for efficient iteration over active solvers in the batch

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
test/test_gpu.jl Added comprehensive test for CUDSS uniform batch solving, verifying results match individual solver runs
test/runtests.jl Modified simple_lp helper to accept configurable Avals parameter and changed exact equality to approximate equality for robustness
src/solver.jl Refactored initialization into pre_initialize!, init_starting_point_solve!, and post_initialize! for batch compatibility; extracted helper functions for key operations
src/linear_solver.jl Added build_kkt! wrapper and extracted post_solve! logic from solve_system! for better modularity
ext/MadIPMCUDAExt/UniformBatch/structure.jl Implemented UniformBatchSolver structure to manage multiple solver instances with shared batch KKT system
ext/MadIPMCUDAExt/UniformBatch/solver.jl Implemented batch versions of initialization, factorization, and MPC algorithm with selective solver activation
ext/MadIPMCUDAExt/UniformBatch/kkt.jl Implemented UniformBatchKKTSystem for managing batched matrix factorizations and solves with CUDSS
ext/MadIPMCUDAExt/UniformBatch/broadcast.jl Added custom broadcasting to efficiently iterate over active solvers in batch
ext/MadIPMCUDAExt/UniformBatch/UniformBatch.jl Module entry point defining helper functions for batch solving with reduced KKT systems
ext/MadIPMCUDAExt/MadIPMCUDAExt.jl Added include statement for UniformBatch module
Project.toml Relaxed MadNLPGPU version constraint from 0.7.15 to 0.7

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Andrew Rosemberg <[email protected]>
@andrewrosemberg
Copy link

LGTM, I can recheck after the fixme and todo are done, but these seem straightforward.

DCOPF test case will be nice to confirm on a larger case.

@klamike
Copy link
Collaborator Author

klamike commented Dec 18, 2025

Here are the benchmark results, from running https://github.com/klamike/MadIPM.jl/blob/mk/batch_profile/test.jl (note it is tol=1e-4 and cudss_ir=3)

                Starting 89_pegase

[ Info: Loading matpower file
┌ Info: 89_pegase x 4 -- Batch is 1.68x faster
│   t_loop = 0.292890551
│   t_batch = 0.174158794
│   t_loop - t_batch = 0.118731757
│   t_loop / batch_size = 0.07322263775
└   t_batch / batch_size = 0.0435396985
┌ Info: 89_pegase x 16 -- Batch is 1.41x faster
│   t_loop = 1.065516498
│   t_batch = 0.755904324
│   t_loop - t_batch = 0.3096121740000001
│   t_loop / batch_size = 0.066594781125
└   t_batch / batch_size = 0.04724402025
┌ Info: 89_pegase x 64 -- Batch is 1.78x faster
│   t_loop = 4.670513256
│   t_batch = 2.619187511
│   t_loop - t_batch = 2.0513257449999998
│   t_loop / batch_size = 0.072976769625
└   t_batch / batch_size = 0.040924804859375


                Starting 1354_pegase

[ Info: Loading matpower file
┌ Info: 1354_pegase x 4 -- Batch is 1.93x faster
│   t_loop = 0.619273734
│   t_batch = 0.320906251
│   t_loop - t_batch = 0.298367483
│   t_loop / batch_size = 0.1548184335
└   t_batch / batch_size = 0.08022656275
┌ Info: 1354_pegase x 16 -- Batch is 2.32x faster
│   t_loop = 2.469379521
│   t_batch = 1.063210762
│   t_loop - t_batch = 1.406168759
│   t_loop / batch_size = 0.1543362200625
└   t_batch / batch_size = 0.066450672625
┌ Info: 1354_pegase x 64 -- Batch is 2.16x faster
│   t_loop = 10.421725742
│   t_batch = 4.823384208
│   t_loop - t_batch = 5.598341533999999
│   t_loop / batch_size = 0.16283946471875
└   t_batch / batch_size = 0.07536537825


                Starting 2869_pegase

[ Info: Loading matpower file
┌ Info: 2869_pegase x 4 -- Batch is 2.25x faster
│   t_loop = 0.958210774
│   t_batch = 0.425290291
│   t_loop - t_batch = 0.5329204830000001
│   t_loop / batch_size = 0.2395526935
└   t_batch / batch_size = 0.10632257275
┌ Info: 2869_pegase x 16 -- Batch is 3.06x faster
│   t_loop = 3.95141398
│   t_batch = 1.29147746
│   t_loop - t_batch = 2.6599365199999996
│   t_loop / batch_size = 0.24696337375
└   t_batch / batch_size = 0.08071734125
┌ Info: 2869_pegase x 64 -- Batch is 3.15x faster
│   t_loop = 16.175443389
│   t_batch = 5.13312229
│   t_loop - t_batch = 11.042321099000002
│   t_loop / batch_size = 0.252741302953125
└   t_batch / batch_size = 0.08020503578125


                Starting 6470_rte

[ Info: Loading matpower file
┌ Info: 6470_rte x 4 -- Batch is 1.7x faster
│   t_loop = 1.65961581
│   t_batch = 0.978801008
│   t_loop - t_batch = 0.680814802
│   t_loop / batch_size = 0.4149039525
└   t_batch / batch_size = 0.244700252
┌ Info: 6470_rte x 16 -- Batch is 4.13x faster
│   t_loop = 6.732655854
│   t_batch = 1.629377134
│   t_loop - t_batch = 5.10327872
│   t_loop / batch_size = 0.420790990875
└   t_batch / batch_size = 0.101836070875
┌ Info: 6470_rte x 64 -- Batch is 4.91x faster
│   t_loop = 27.903837811
│   t_batch = 5.677465911
│   t_loop - t_batch = 22.2263719
│   t_loop / batch_size = 0.435997465796875
└   t_batch / batch_size = 0.088710404859375

                Starting 9241_pegase

[ Info: Loading matpower file
┌ Info: 9241_pegase x 4 -- Batch is 1.06x faster
│   t_loop = 2.934963509
│   t_batch = 2.772024485
│   t_loop - t_batch = 0.16293902399999993
│   t_loop / batch_size = 0.73374087725
└   t_batch / batch_size = 0.69300612125
┌ Info: 9241_pegase x 16 -- Batch is 4.73x faster
│   t_loop = 11.660498002
│   t_batch = 2.465172773
│   t_loop - t_batch = 9.195325229000002
│   t_loop / batch_size = 0.728781125125
└   t_batch / batch_size = 0.1540732983125
┌ Info: 9241_pegase x 64 -- Batch is 5.1x faster
│   t_loop = 48.443547009
│   t_batch = 9.497881699
│   t_loop - t_batch = 38.945665309999995
│   t_loop / batch_size = 0.756930422015625
└   t_batch / batch_size = 0.148404401546875

Copy link
Member

@frapac frapac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This look good to me! The implementation of the batch solver is very nice, and does not touch a lot to the internals in MadIPM. I like the direction this project is taking. In the long term, I would be interesting to know how much do we lose by not storing our vector contiguously in memory along the different batches.

I only have minor comments so far. This PR can be merged as soon they are addressed, so we can move on this.

while true
# Check termination criteria
for_active(batch_solver,
MadNLP.print_iter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we print the iter for all batches? Isn't the output a bit messy as a result?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is quite messy. I just didn't want to think about it yet 😉

# Print remaning options (unsupported)
if !isempty(remaining_options)
MadNLP.print_ignored_options(logger, remaining_options)
# MadNLP.print_ignored_options(logger, remaining_options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead comment? We can remove this line if needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to NoLinearSolver not considering the args that are meant for the batch solver. I do think it's a useful feature to report ignored options, it would be better to refactor how options are handled in the BatchSolver constructor so we detect what is meant for the batch solver, and what is meant for the individual MPCSolver..

@klamike klamike changed the base branch from master to batch December 19, 2025 03:34
@klamike
Copy link
Collaborator Author

klamike commented Dec 26, 2025

Merging this into the batch branch. There are still some todos from François' review, which I will address in subsequent PRs to batch.

@klamike klamike merged commit 4f8a53a into MadNLP:batch Dec 26, 2025
5 checks passed
solver::MadNLP.AbstractMadNLPSolver;
kwargs...
)
function solve!(solver::MadNLP.AbstractMadNLPSolver)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klamike Why did you drop the support for kwargs... and options?
For example, it could be nice to use iterative refinement on the fly.

Copy link
Collaborator Author

@klamike klamike Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was a bug here, (at least some of?) these options don't actually get set or something. Need to go back and check the details more closely.

Note it is not "on the fly" since we re-initialize in solve!. But it would still be nice to allow re-solve with updated options.

Copy link
Member

@amontoison amontoison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work @klamike 👍
I only have one comment with the keyword arguments of the function solve!.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants