Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve access to LpProblem.variables() #776

Open
WPettersson opened this issue Sep 30, 2024 · 4 comments
Open

Improve access to LpProblem.variables() #776

WPettersson opened this issue Sep 30, 2024 · 4 comments

Comments

@WPettersson
Copy link
Contributor

WPettersson commented Sep 30, 2024

Describe the new feature

For what seems like historical reasons, accessing LpProblem.variables()

pulp/pulp/pulp.py

Line 1593 in 6af3801

def variables(self):
involves iterating through all constraints (and the objective) and adding all variables. This (as I read it is) is essentially a no-op since addConstraint calls the same code when a constraint is added. Additionally, after doing this, LpProblem.variables() sorts the variables by name. For me, this is not necessary and just a waste of cycles. I tried to check the history, and it seems it's just been this way for at least 12 years.

My first question is whether there's a need to do addVariables() at all inside variables()? If not, can we simplify LpProblem.variables() to just return self._variables?

For what it's worth, I'm currently just bypassing variables() in my code and accessing (and caching) LpProblem._variables for now. However, I realise this is bad practice.

Additional info

Please answer these questions before submitting your feature request.

Is your feature request related to an issue? Please include the issue number.

I don't think so.

Does this feature exist in another product or project? Please provide a link.

I'm guessing it probably is.

@pchtsp
Copy link
Collaborator

pchtsp commented Sep 30, 2024

Hello! Thanks for flagging. I haven't analyzed this logic much in the past. Mainly because it's never been the bottleneck in my use cases. Is it in yours? How are you using it?

It seems that once a variable is in the self._variables list, it's not added again because it's hashed (see addVariable):

pulp/pulp/pulp.py

Lines 1574 to 1605 in 6af3801

def addVariable(self, variable):
"""
Adds a variable to the problem before a constraint is added
:param variable: the variable to be added
"""
if variable.hash not in self._variable_ids:
self._variables.append(variable)
self._variable_ids[variable.hash] = variable
def addVariables(self, variables):
"""
Adds variables to the problem before a constraint is added
:param variables: the variables to be added
"""
for v in variables:
self.addVariable(v)
def variables(self):
"""
Returns the problem variables
:return: A list containing the problem variables
:rtype: (list, :py:class:`LpVariable`)
"""
if self.objective:
self.addVariables(list(self.objective.keys()))
for c in self.constraints.values():
self.addVariables(list(c.keys()))
self._variables.sort(key=lambda v: v.name)
return self._variables

And I'm guessing that even if you just added one or two variables and re-sort, the sorting goes really fast because it starts with an almost sorted list.

@WPettersson
Copy link
Contributor Author

My use case is implementing reduced cost variable fixing while doing some lexicographical multi-objective optimisation in a pseudo-boolean problem (all variables are binary variables). This means for each of my lexicographic objectives, I'd do something like

deactivated = set()
while True:
  for variable in problem.variables():
    if gap < reducedCost[variable]:
      variable.upBound = 0
      deactivated += variable
  problem.solve()
  if problem.objective.value() > target:
    target += 1
    for variable in deactivated:
      variable.upBound = 1
    deactivated = set()
  else:
    break

where target is calculated from a linear relaxation, and reducedCost is a dict mapping LpVariables to their reduced cost (LpVariable.dj) in the already-solved linear relaxation. The caching of these dual variables is important as repeatedly solving the IP would lose the reduced costs.

It wasn't a huge bottle-neck, but I was able to go from ~10 seconds to ~6 seconds to solve my overall problem by avoiding calling LpProblem.variables() at all. I don't know for sure whether the speed-up was a result of avoiding repeat calls to LpProblem.variables(), or whether the speed-up came from avoiding calling LpProblem.variables() completely (and avoiding any sorting of the variables).

For the record, I have ~200k-300k variables and 250 constraints in test instances, and significantly more in the real instances I want to solve.

@pchtsp
Copy link
Collaborator

pchtsp commented Sep 30, 2024

ok, that's indeed a niche way to solve a problem many many times.
What about just storing the variables once in the loop and then iterating over that? Or are you editing the problem by creating or deleting variables/ constraints?

In your example:

deactivated = set()
my_variables = problem.variables()
while True:
  for variable in my_variables:
    if gap < reducedCost[variable]:
      variable.upBound = 0
      deactivated += variable
  problem.solve()
  if problem.objective.value() > target:
    target += 1
    for variable in deactivated:
      variable.upBound = 1
    deactivated = set()
  else:
    break

@WPettersson
Copy link
Contributor Author

That's exactly what I did, in the end. Once I get to this step, I am no longer adding variables or constraints.

As for the technique itself, it's only useful if the objective value of the linear relaxation is often close (within an integer or two) of the objective value of the integer problem itself. If you really want more background, our paper where we use this technique (and the code I'm re-implementing) is from https://pubsonline.informs.org/doi/full/10.1287/opre.2022.2374

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants