Job queue issues #14

wouterbles · 2023-08-10T11:10:50Z

@yvanoers proposed changes to the job queue in #11, I reverted these changes in release 1.0.2 after some local testing as it results in longer solves and an incomplete pareto front. Should look into this more thoroughly to find out why.

Things to figure out:

Why is the current implementation faster?
Are other processes not able to help out as described by @yvanoers?
Why do the proposed changes lead to an incomplete pareto front?

yvanoers · 2023-08-11T08:08:47Z

Fascinating.
In my initial testing I found a speed up compared to the current implementation. Though there were other changes included, I would not expect those to have a profound effect on performance.

In any case, @wouterbles, are you able to add a test case that tests for the completeness of the pareto front, perhaps?
That would certainly help me with the investigation.

wouterbles · 2023-08-11T09:49:51Z

Interesting indeed. I have mainly been running the 3kp40/3kp50 cases. Which are in the tests folder.

Did some tests with both versions of the queueing. By the way, make sure to add the latest commit that sets the longest_q on top of your enqueue changes, otherwise I ran into issues. Results:

With enqueue changes: Solved 1275 models for 358 unique Pareto solutions (should be 389)
Without enqueue changes: Solved 868 models for 389 unique Pareto solutions

pytest tests/test_3kp40.py also fails with enqueue changes as it checks if the correct solutions are returned.

wouterbles · 2023-08-11T18:11:13Z

@yvanoers created a fix for this issue in the latest PR, this bug was introduced by a refactor a while ago. Thanks for discovering the issue! Instead of queuing each grid point individually (which you proposed), they are grouped in 'rows', which means processes will take up bigger chunks of work.

yvanoers · 2023-08-11T21:56:26Z

Nice work!

I ran a test with my original changes and it found only 358 solutions.
Then I set the cpu count to 1 (therefore disabling multiprocessing) and it found 389.
Next, reenabled multiprocessing but disabled jumping (commented continue in the if jump > 0 block of SolverProcess) and it also found 389.

So the algorithm needs the items it is processing in order and in the same process. When items get picked off the queue, it could jump over items it is not supposed to skip.

I don't understand the algorithm well enough to known whether your row-based approach does not suffer from this. The jump variable is not reset when a new row is dequeued, so if jump can span rows, this might still be a problem, just occurring less frequently.

wouterbles · 2023-08-12T08:08:59Z

That's definetely something to look into. I think it might make sense to reset the jump counter when taking a new block of work.

wouterbles mentioned this issue Aug 11, 2023

Correctly enqueue items such that work can be taken by other processes #18

Merged

wouterbles closed this as completed Aug 11, 2023

wouterbles reopened this Aug 12, 2023

yvanoers mentioned this issue Aug 17, 2023

Wait for processes to exit instead of STOP result #19

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job queue issues #14

Job queue issues #14

wouterbles commented Aug 10, 2023 •

edited

Loading

yvanoers commented Aug 11, 2023

wouterbles commented Aug 11, 2023 •

edited

Loading

wouterbles commented Aug 11, 2023

yvanoers commented Aug 11, 2023

wouterbles commented Aug 12, 2023

Job queue issues #14

Job queue issues #14

Comments

wouterbles commented Aug 10, 2023 • edited Loading

yvanoers commented Aug 11, 2023

wouterbles commented Aug 11, 2023 • edited Loading

wouterbles commented Aug 11, 2023

yvanoers commented Aug 11, 2023

wouterbles commented Aug 12, 2023

wouterbles commented Aug 10, 2023 •

edited

Loading

wouterbles commented Aug 11, 2023 •

edited

Loading