Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job queue issues #14

Open
wouterbles opened this issue Aug 10, 2023 · 5 comments · Fixed by #18
Open

Job queue issues #14

wouterbles opened this issue Aug 10, 2023 · 5 comments · Fixed by #18

Comments

@wouterbles
Copy link
Owner

wouterbles commented Aug 10, 2023

@yvanoers proposed changes to the job queue in #11, I reverted these changes in release 1.0.2 after some local testing as it results in longer solves and an incomplete pareto front. Should look into this more thoroughly to find out why.

Things to figure out:

  • Why is the current implementation faster?
  • Are other processes not able to help out as described by @yvanoers?
  • Why do the proposed changes lead to an incomplete pareto front?
@yvanoers
Copy link
Contributor

Fascinating.
In my initial testing I found a speed up compared to the current implementation. Though there were other changes included, I would not expect those to have a profound effect on performance.

In any case, @wouterbles, are you able to add a test case that tests for the completeness of the pareto front, perhaps?
That would certainly help me with the investigation.

@wouterbles
Copy link
Owner Author

wouterbles commented Aug 11, 2023

Interesting indeed. I have mainly been running the 3kp40/3kp50 cases. Which are in the tests folder.

Did some tests with both versions of the queueing. By the way, make sure to add the latest commit that sets the longest_q on top of your enqueue changes, otherwise I ran into issues. Results:

  • With enqueue changes: Solved 1275 models for 358 unique Pareto solutions (should be 389)
  • Without enqueue changes: Solved 868 models for 389 unique Pareto solutions

pytest tests/test_3kp40.py also fails with enqueue changes as it checks if the correct solutions are returned.

@wouterbles
Copy link
Owner Author

@yvanoers created a fix for this issue in the latest PR, this bug was introduced by a refactor a while ago. Thanks for discovering the issue! Instead of queuing each grid point individually (which you proposed), they are grouped in 'rows', which means processes will take up bigger chunks of work.

@yvanoers
Copy link
Contributor

Nice work!

I ran a test with my original changes and it found only 358 solutions.
Then I set the cpu count to 1 (therefore disabling multiprocessing) and it found 389.
Next, reenabled multiprocessing but disabled jumping (commented continue in the if jump > 0 block of SolverProcess) and it also found 389.

So the algorithm needs the items it is processing in order and in the same process. When items get picked off the queue, it could jump over items it is not supposed to skip.

I don't understand the algorithm well enough to known whether your row-based approach does not suffer from this. The jump variable is not reset when a new row is dequeued, so if jump can span rows, this might still be a problem, just occurring less frequently.

@wouterbles wouterbles reopened this Aug 12, 2023
@wouterbles
Copy link
Owner Author

That's definetely something to look into. I think it might make sense to reset the jump counter when taking a new block of work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants