Partitioned backends #97

paraynaud · 2023-04-11T14:51:46Z

paraynaud
Apr 11, 2023
Collaborator

@dpo, Here are the first profiles comparing the different backends.
I compared a PSR1NLPModel using four different partitioned gradient backends:

from element ReverseDiff.Tapes;
from element MOI.Evaluator{MOI.Nonlinear.Model}s;
from the sparse Jacobian (unique MOI.Evaluator{MOI.Nonlinear.Model});
from the modified objective (unique MOI.Evaluator{MOI.Nonlinear.Model}) $f : \mathbb{R}^n \to \mathbb{R}$ with $n=\sum_{i=1}^N n_i$.

For these initial results, I considered n=500 and the trunk() solver, set up with max_time=300. and max_eval=30000.
The purpose of these experiments is to find some tendencies and which partitioned backend may be the best as a default partitioned backend.

I compared it with the performances of other QuasiNewtonModels:

LBFGSModel(MathOptNLPModel);
LBFGSModel(ADNLPModel).

I made these comparisons in Float64, for which LBFGSModel(MathOptNLPModel) is the most computationally efficient model.
I made two set of results, the first one considers both the LBFGSModel(NLPModel) while the second does not, to analyze more clearly the distinction between the partitioned backends.

Profiles with LBFGSModels:

Profiles with partitioned backends:

In these profiles, the iteration's profile prefers (as we already know) the partitioned quasi-Newton methods.
Due to the max_time parameter and numerical differences, the 4 different PSR1LNPModel runs are not completely identical.
Two of them, using respectively the sparse Jacobian and a ReverseDiff.Tape produce the exact same iterations, while the others take more iteration (with element MOI.Evaluator{MOI.Nonlinear.Model}s) or doesn't solver the same amount of problems due to max_time.
However, when it comes to time, LBFGSModel(MathOptNLPModel) outperforms any of the partitioned backends (and it is not close).
Among the partitioned backends, it is very close.
The sparse Jacobian seems to be the best method overall.
The modified function backend is the best in a majority of problems (at abscissa 0) despite solving fewer problems.

Hypothesis:

Data structure implementations are more efficient as their size grow.
For example, the time needed to make a broadcast over a Vector is not linear with the Vector's size.
It makes PartitionedVector, distributing the broadcasted operation over the element Vector, instantly less efficient.
There is an inevitable overhead when forming the Vector associated to a PartitionedVector, for example when the norm or a dot product is computed.
In addition, the partitioned structure requires more computation than the unstructured case.

Possible improvements:

Implement a complete merge procedure seeking to reduce computations as much as possible.
For now, some problems may have elements completely overlapping themselves, which is a performance killer :/
Maybe rethink how the partitioned structure work, using a single Vector to represent a partitioned vector, similarly to the modified objective function. But I'm not sure.
Determine a heuristic to chose "wisely" the partitioned backend depending on the problem.

Remark 1: For all the partitioned quasi-Newton models, the objective function is computed from the original NLPModel used to define the partitioned model (in the profiles NLPModel.obj(::MathOptNLPModel, x).

Remark 2: I decreased the threshold in case the partitioned function is completely merged into a single function, but it didn't change the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partitioned backends #97

{{title}}

Replies: 0 comments

Select a reply

Partitioned backends #97

paraynaud Apr 11, 2023 Collaborator

Replies: 0 comments

paraynaud
Apr 11, 2023
Collaborator