Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
During @vishalgupta97 internship we observed problems with the work stealing algorithm that if all the work was generated by a single scheduler thread, then the other threads had heavy contention in the stealing code. This is due to the stealing code only stealing a single item at a time.
There are a few papers that suggest stealing in larger units than single work items:
The suggestions is that stealing at most half of the work is sensible.
This PR changes the implementation of the scheduler queues to support a steal all operation. By providing N (N=4 in the PR) queues per scheduler thread, then we are able to steal approximately 1/4 of the work on a scheduler thread.
The PR adds a very simple benchmark where a single behaviour generates all the work in the system. Configured with 4 cores, we get
Before this PR:
While with his PR:
This is running on my laptop with just four cores, so bigger experiments should be undertaken.
There is also a lot of opportunity for further investigations, in terms of different strategies for stealing and scheduling work. This PR represents a simple position that improves the state of the system.