Partitioned backends #97
paraynaud
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
@dpo, Here are the first profiles comparing the different backends.
I compared a PSR1NLPModel using four different partitioned gradient backends:
ReverseDiff.Tape
s;MOI.Evaluator{MOI.Nonlinear.Model}
s;MOI.Evaluator{MOI.Nonlinear.Model}
);MOI.Evaluator{MOI.Nonlinear.Model}
)For these initial results, I considered
n=500
and thetrunk()
solver, set up withmax_time=300.
andmax_eval=30000
.The purpose of these experiments is to find some tendencies and which partitioned backend may be the best as a default partitioned backend.
I compared it with the performances of other
QuasiNewtonModel
s:LBFGSModel(MathOptNLPModel)
;LBFGSModel(ADNLPModel)
.I made these comparisons in Float64, for which
LBFGSModel(MathOptNLPModel)
is the most computationally efficient model.I made two set of results, the first one considers both the
LBFGSModel(NLPModel)
while the second does not, to analyze more clearly the distinction between the partitioned backends.Profiles with LBFGSModels:
Profiles with partitioned backends:
In these profiles, the iteration's profile prefers (as we already know) the partitioned quasi-Newton methods.
Due to the
max_time
parameter and numerical differences, the 4 different PSR1LNPModel runs are not completely identical.Two of them, using respectively the sparse Jacobian and a
ReverseDiff.Tape
produce the exact same iterations, while the others take more iteration (with elementMOI.Evaluator{MOI.Nonlinear.Model}
s) or doesn't solver the same amount of problems due tomax_time
.However, when it comes to time,
LBFGSModel(MathOptNLPModel)
outperforms any of the partitioned backends (and it is not close).Among the partitioned backends, it is very close.
The sparse Jacobian seems to be the best method overall.
The modified function backend is the best in a majority of problems (at abscissa 0) despite solving fewer problems.
Hypothesis:
For example, the time needed to make a broadcast over a
Vector
is not linear with theVector
's size.It makes
PartitionedVector
, distributing the broadcasted operation over the elementVector
, instantly less efficient.Vector
associated to aPartitionedVector
, for example when thenorm
or adot
product is computed.Possible improvements:
For now, some problems may have elements completely overlapping themselves, which is a performance killer :/
Vector
to represent a partitioned vector, similarly to the modified objective function. But I'm not sure.Remark 1: For all the partitioned quasi-Newton models, the objective function is computed from the original NLPModel used to define the partitioned model (in the profiles
NLPModel.obj(::MathOptNLPModel, x)
.Remark 2: I decreased the threshold in case the partitioned function is completely merged into a single function, but it didn't change the results.
Beta Was this translation helpful? Give feedback.
All reactions