Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigating (and improving) file format handling in backends #660

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

sstroemer
Copy link
Contributor

@sstroemer sstroemer commented Aug 8, 2024

Fixes: nothing

This is a (pre-vacation) draft of some local changes that I needed, with a few open questions, some findings, and maybe a few interesting updates.

Summary of changes in this pull request

The content below explains the changes, and I will add upcoming changes to this list. Benchmarks and insights, as well as open questions are given after that.

Step-by-step of new functionality

  • 5725dd6: Allow writing MPS files, by implementing to_mps(...) based on to_lp(...).
  • 37a476e: Implement the file extension (suffix) check in the Pyomo backend to be consistent with the Gurobi backend.
  • 782a268: Add kwargs to both to_lp(...) and to_mps(...), to allow passing keyword arguments to the file writers. This, e.g., allows setting symbolic_solver_labels explicitly (previously it was active by default in the Pyomo backend).
  • LP file writing with the Gurobi backend does not work, because the object names contain special characters and spaces. Refer to the Gurobi documentation stating: "Note also that names that contain spaces are strongly discouraged, because they can’t be written to LP format files."
    • b5d7860: Remove spaces.
    • 293177b: Replace other internally used special characters. This does not catch errors due to "wrong" user configurations.
  • 6e3529b: In 782a268 the default symbolic_solver_labels = True was lost. To not break the expected default functionality, this internally enforces the default again.
  • 977a521: This allows passing Gurobi parameters (in a hacky way) to the backend, before any actual model building happens. This is important for setting IgnoreNames, which only affects objects added after it was set. This does not do anything write now, but was done while trying to prepare for "proper" names in the Gurobi backend.
  • 26e10c5 changes two things:
    • It sets explicit default values for solve_kwargs. This was previously initialized as empty dictionary, and then updating if logs are to be kept with {"symbolic_solver_labels": True, "keepfiles": True}. This indicates that, e.g., symbolic_solver_labels = False is the assumed default, which may change without anyone immediately noticing it (Pyomo release notes can be long...). While doing that it also moves where tee = True is set, to allow overwriting it.
    • While most options can be set via opt.options, some settings are done either via direct functions or using keyword arguments directly to the solve(...) function. To allow setting those, this now allows passing entries to solver_options that start with pyomo_. Those are caught and handled accordingly. Currently this allows: Manually modifying solve_kwargs (e.g., setting keepfiles without having to activate save_logs), and most importantly setting the ProblemFormat that Pyomo should use (which allows forcing, e.g., cpxlp or mps formats) when passing models as file to a solver.
  • 1af2759: When using the Gurobi backend to write model files, one may be interested in using the automated compression feature that Gurobi support [docs1, docs2].
  • 5b56de8: I've missed that the Pyomo backend uses double underscores to join stuff, which I should have since the Gurobi backend had a two-char join with , too; this is fixed now.
  • c64c089: Contains a draft for a way to circumvent the kwarg problem pointed out by Bryn.

Fixes

b68d2f3 fixes an error that occurs when a solver returns an "okay" status code (at least one that is interpreted as such) but does not return any solutions, which can happen, e.g., for a time limit abort with Gurobi. This lead the to the following error, that is now caught, including a warning for the user.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/calliope/src/calliope/model.py", line 441, in solve
    results = self.backend._solve(warmstart=warmstart, **solver_config)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/calliope/src/calliope/backend/pyomo_backend_model.py", line 315, in _solve
    self._instance.load_solution(results.solution[0])
                                 ~~~~~~~~~~~~~~~~^^^
  File "/home/user/miniconda3/envs/calliope/lib/python3.12/site-packages/pyomo/opt/results/container.py", line 174, in __getitem__
    return self._list[i]
           ~~~~~~~~~~^^^
IndexError: list index out of range

As indicated above, without b5d7860 and 293177b errors like the following occur when writing LPs with Gurobi:

Warning: variable name "flow_cap[N1, N1_to_X2, heat]" has a space
Warning: constraint name "area_use_per_flow_capacity[X1, pv, electricity]" has a space
Warning: to let Gurobi read it back, use rlp format

Benchmarks

If you see something that sounds weird, please point out - I did not have any time to really check the benchmarks in detail.

  • Benchmarks are done with the urban_scale model, using a half year ("config.init.time_subset": ["2005-01-01", "2005-06-30"]) or full year ("config.init.time_subset": ["2005-01-01", "2005-12-31"]) override.
  • Timings are measured from the "outside", capturing the full function calls of urban_scale (parse), build (build), to_** (write), and solve (pass).
  • Timings are mean out of 15 runs (half year) or 5 runs (full year).
  • Passing the model to the solver is measured by setting TimeLimit = 0 (for Gurobi) or -seconds 0 (for Cbc, command line).
  • ssl indicates symbolic_solver_labels.
  • Cbc was also used from the command line, like cbc model.lp -sec 0 solve.

Summary

Current summary: No need for MPS files if not explicitly needed.

  • For Pyomo, MPS files are approx. 30% slower for write and pass than LP files.
  • For Gurobi, MPS files are ~25% faster than LP files (for writing, since passing is almost instant since gurobipy is used).
  • Gurobi file writing is >6x faster than Pyomo.
  • Loading the MPS file with Cbc (command line) is ~15% faster than for the equivalent LP file (note: Cbc cannot read the Gurobi LP file by default, a manual replacement from -infinity to -inf has to be done). The Gurobi written LP and MPS files are slightly faster to read, with slightly below 10% (but I did not in-depth benchmark this).

Compression

For compressed files from Gurobi, reading them is almost identical to reading the uncompressed files. Rough stats for the different compression formats, based on "full year":

format write [s] size [MB]
.lp.7z 14.4 1.4
.lp.bz2 3.9 3.9
.lp.gz 2.35 6.9
.mps.7z 23.6 2.4
.mps.bz2 4.2 6.7
.mps.gz 1.6 11.7

Uncompressed files are 29.4 MB for LP and 133.2 MB for MPS.

Summary:

  • 7z for best file size, but slow
  • bz2 for good file sizes, and good time spent
  • gz for basic compression, but fast

Results

"half year"

backend format parse build write pass comment
pyomo lp 3.00e+00 2.29e+01 1.16e+01 1.35e+01 with-ssl
pyomo mps 2.96e+00 2.44e+01 1.58e+01 1.75e+01 with-ssl
pyomo lp 2.95e+00 2.33e+01 8.26e+00 9.91e+00 no-ssl
pyomo mps 3.06e+00 2.53e+01 1.22e+01 1.39e+01 no-ssl
gurobi lp 2.97e+00 2.45e+01 1.40e+00 2.73e-01 -
gurobi mps 2.92e+00 2.56e+01 1.08e+00 3.28e-01 -
pyomo lp 2.60e+00 1.97e+01 1.03e+01 - with-ssl
pyomo lp 2.35e+00 1.78e+01 6.59e+00 - no-ssl

"full year"

backend format parse build write pass comment
pyomo lp 0.00e+00 0.00e+00 0.00e+00 1.79e+01 no-ssl
pyomo mps 0.00e+00 0.00e+00 0.00e+00 2.36e+01 no-ssl

Questions

[01] I would like to allow for more complex default names of variables and constraints when using the Gurobi backend. Currently, when adding name=name to the various add_*** functions the names do not include an increasing number (as the Pyomo backend uses). Variables are then just named, e.g., flow_cap, which obviously prevents writing the model to a file. I fixed the verbose names, but those could be too verbose for certain use cases where variable(flow_cap)(0), or something similar could be sufficient. Since I don't really understand the important internals right now:

  • What would be the best way to achieve this? Getting a variable count from Gurobi? Maintaining one internally? Using some internal that is already there?

[02] I encountered an upstream bug in Pyomo (#3336), which has already been fixed and merged (#3337) into main.

  • What is the current reason to limit pyomo < 6.7.2? Changes to gurobi_direct and the new "manual" IIS calculation could be interesting to include.

Reviewer checklist

For myself:

  • pre-commit checked (ruff and black should probably fail) - done in 294ce40
  • tests checked
  • new tests for mps, and ???

Original:

  • Test(s) added to cover contribution
  • Documentation updated
  • Changelog updated
  • Coverage maintained or improved

@brynpickering
Copy link
Member

@sstroemer thanks for this!

A few minor points:

  1. You could switch on symbolic solver labels for Pyomo only if calliope.Model.backend.verbose_strings is called.
  2. If removing spaces from Gurobi verbose strings, do the same with Pyomo.
  3. I want to minimise the difference between different backends in terms of how users interact with them from Calliope. For that reason, I'm inclined to say we shouldn't have **kwargs in to_lp(...) etc. Once you're getting to the point of wanting to pass backend-specific args, you need to know that backend's API, and at that point you are a "super-user" who would be better off accessing the backend instance directly (model.backend._instance). You then benefit from method docstrings, too.

Stefan Strömer added 3 commits August 14, 2024 18:06
Refactors the way defaults are handled and moves the puts the documentation into the abstract method definition.
@sstroemer
Copy link
Contributor Author

Thanks for taking the time to check this out @brynpickering (really!).

Note that none of this is meant as "should be improved", I'm perfectly happy to close this anytime and keep the few things I need as local patch!


  1. You could switch on symbolic solver labels for Pyomo only if calliope.Model.backend.verbose_strings is called.

(Un)Fortunately these achieve different things: verbose_strings(...) transforms the less-verbose names into more-verbose names, while symbolic solver labels pass (or don't) the names regardless of verbosity to the solver in the model file. Depending on the file writer the latter would produce variable names like x708748, while the less-verbose names, for example variables(flow_cap)(0), already contain quite a lot of information on top of the anonymous names (without the need to pay the overhead cost of transforming).

Maybe a more generalized verbosity setting (e.g., anonymous, default, verbose) could clean that interaction up, but I didn't want to change anything that might be unnecessary!


  1. If removing spaces from Gurobi verbose strings, do the same with Pyomo.

I fully get the part of backends being "identical" to the user, that's what I kind of had in mind with some of the changes, see the following comparison on variables names:

state backend verbose_strings example
upstream pyomo no variables(flow_out)(288)
upstream gurobi no C323
upstream pyomo yes variables(flow_out)(X1__chp__electricity__2005_07_01_00_00)
upstream gurobi yes flow_out[X1,_chp,_electricity,_2005-07-01_00_00]
#660 gurobi yes flow_out[X1__chp__electricity__2005_07_01_00_00]

Note: Other "problems" arise for spaces, e.g., flow_cap[N1, N1_to_X2, heat] (gurobi) versus variables(flow_cap)(N1__N1_to_X2__heat) (pyomo), which is now changed to flow_cap[N1__N1_to_X2__heat].

The prefix is still not the same, but I assumed that was on purpose, so I just aligned the way the indices are used.


  1. I want to minimise the difference between different backends in terms of how users interact with them from Calliope. For that reason, I'm inclined to say we shouldn't have **kwargs in to_lp(...) etc. Once you're getting to the point of wanting to pass backend-specific args, you need to know that backend's API, and at that point you are a "super-user" who would be better off accessing the backend instance directly (model.backend._instance). You then benefit from method docstrings, too.

I understand that, the only "problem" I see with this, is that this means not being able to customize the functionality that you expose. For example, to_lp(...): Writing out files might be wanted by a user 1, so supporting symbolic_solver_names as a configurable setting would be beneficial. However, Gurobi does not support anything like that at all, which would mean not supporting it for any other backend. On the other hand, the way Gurobi handles it, would be aligned with how a potential future JuMP.jl backend would handle it too, which would then also clash with Pyomo...

What about:

  1. Keeping the **kwargs and documenting all possible keywords in the docstring of the abstract method to_lp(...). This shows up when inspecting docstrings in an IDE.
  2. Checking the passed arguments in the respective functions, and gracefully warning + ignoring them.

To better visualize that, I've drafted that in c64c089

Footnotes

  1. For example, we (most of the time) save the full model file alongside every run for "documentation" purposes; this allows immediately re-running the same model, or checks on the model file, without having to rebuild it. While this takes some time for large models, it is something that can be done after starting the solver, then running in parallel, and therefore is "free". That workflow applied to Calliope would be something like build, start [solve], start [verbose_strings, to_lp].

@brynpickering
Copy link
Member

What is the current reason to limit pyomo < 6.7.2? Changes to gurobi_direct and the new "manual" IIS calculation could be interesting to include.

Because the interaction with the gurobi solver has changed so that you can't just call solver=gurobi with the pyomo backend, and I hadn't managed to investigate how to fix it!

@brynpickering
Copy link
Member

I fully get the part of backends being "identical" to the user, that's what I kind of had in mind with some of the changes, see the following comparison on variables names:

I agree with the approach of making everything comparable. It's a pity it requires __ delimiters as I find them difficult to parse when there are already so many other underscores being used in the dimension names. But if that's what's required I guess it's better to aim for consistency.

RE symbolic solver labels. I can see your reasoning although I tend to think that if you want verbose strings then you definitely want symbolic solver labels when saving to LP(/MPS?). I think you could use **kwargs in the ABC and then have symbolic_solver_labels: bool = True, **kwargs in the signature of PyomoBackendModel.to_lp and just **kwargs in the signature of GurobiBackendModel.to_lp.. Then you can have the docstring include something like "**kwargs: backend-specific keyword arguments. Only those explicitly defined in the backend method signature will be passed on - all others will be ignored". Then you just ignore **kwargs completely in both to_lp functions. Does that make sense?

@sstroemer
Copy link
Contributor Author

Then you can have the docstring include something like "**kwargs: backend-specific keyword arguments. Only those explicitly defined in the backend method signature will be passed on - all others will be ignored". Then you just ignore **kwargs completely in both to_lp functions. Does that make sense?

Sounds good - do you want me to still include a model_warn(...) if **kwargs is not empty, or just a silent ignore?


I can see your reasoning although I tend to think that if you want verbose strings then you definitely want symbolic solver labels when saving to LP(/MPS?).

Definitely! Sorry I meant the other way round: I may want symbolic solver labels even without verbose strings, since the non-verbose still contain some information.

It's a pity it requires __ delimiters as I find them difficult to parse when there are already so many other underscores being used in the dimension names.

I guess that could be changed, as long as it's changed in both backends? Not saying that this is ideal, but our model's equivalent of variables(flow_out)(X1__chp__electricity__2005_07_01_00_00) would be along the lines of chp.var.flow_out[X1,electricity,2817].

Maybe instead of variables(flow_out)(X1__chp__electricity__2005_07_01_00_00), some combination of different ways like:

  • var(flow_out)(X1𑑛chp𑑛electricity𑑛2005_07_01_00_00)
  • var_flow_out[X1𑑛chp𑑛electricity𑑛2005_07_01_00_00]
  • x[flow_out𑑛X1𑑛chp𑑛electricity𑑛2005_0701_0000]

with 𑑛 being any concat placeholder (["-", "_", ",", ":", "|", ...]) that is preferred, and var could be replaced by variable or variables (or even v, since v/c/o are still unique identifiers).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants