Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split schedule many #54

Merged
merged 9 commits into from
Nov 6, 2024

Conversation

marioskogias
Copy link
Collaborator

Attempt to refactor schedule many as proposed in #49

@marioskogias marioskogias marked this pull request as ready for review September 27, 2024 15:55
Copy link
Member

@mjp41 mjp41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has worked really nicely. A few minor comments, and some notes for getting multi-schedule and read-only working together.

src/rt/sched/behaviourcore.h Outdated Show resolved Hide resolved
src/rt/sched/behaviourcore.h Outdated Show resolved Hide resolved
src/rt/sched/behaviourcore.h Outdated Show resolved Hide resolved
src/rt/sched/behaviourcore.h Outdated Show resolved Hide resolved
src/rt/sched/behaviourcore.h Outdated Show resolved Hide resolved
src/rt/sched/behaviourcore.h Outdated Show resolved Hide resolved
src/rt/sched/behaviourcore.h Outdated Show resolved Hide resolved
src/rt/sched/behaviourcore.h Outdated Show resolved Hide resolved
src/rt/sched/behaviourcore.h Outdated Show resolved Hide resolved
src/rt/sched/behaviourcore.h Outdated Show resolved Hide resolved
src/rt/sched/behaviourcore.h Outdated Show resolved Hide resolved
Copy link
Member

@mjp41 mjp41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@vishalgupta97 vishalgupta97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjp41 next_slot_reader, next_slot_writer: The comment added in commit doesn't follow the assert. The assert was there to ensure that last two bits of the pointer are zero (since we are using those bits for other purposes). I think the comment about specific bits being set in the status variable should be removed as there is no precondition for those bit to be in a particular state when the function is called.

@marioskogias @mjp41: I think there might be a deadlock scenario after the refactoring. In the release phase, if the current one is a reader, it will not wakeup the next slot if that is the reader. The assumption is that all contiguous readers in the queue (without any writer in between) will be woken up simultaneously: Either by the reader being the first one in the queue or writer waking up all readers until it reaches the end or finds another writer.

Before the refactoring, a reader slot when entering the queue (first one in the queue) will mark itself as read_available before setting the slot as ready. This will ensure that when a new reader joins the queue, it will spin on the 2PL bit and once the loop breaks, the new reader will see that the previous reader is available and will mark itself as ready for execution (decrement the exec_count).

After the refactoring, the steps done when reader A is the first one in the queue is moved after setting the slot as ready. This can create a scenario where the next reader B sees that the previous reader A is blocked, so it will assume that some writer will wake them up (That is the only possibility). And the reader A just marks itself as available and continue. Now B is blocked as no one will wake it up.

I'm not able to replicate this scenario via asserts but I think it can result in a deadlock. One possible solution is to wakeup reader B when reader A sees that the next slot is a reader and it is set before A marks its slot as available.

@mjp41
Copy link
Member

mjp41 commented Oct 8, 2024

So I think Vishal is right. So there are two cases where we need to specify that the read_available flag:

The second of these is after the release phase, so I think could cause a lost wake up. The first one is in the correct place. I think we need to flag the read_available for read only chains in:

if (prev_slot == nullptr)
{
chain_info[i].had_no_predecessor = true;
continue;
}

@vishalgupta97
Copy link
Contributor

The read reference count also needs to be taken along with marking the slot as being read available before marking the slot as ready otherwise read and writes can happen in parallel.

@marioskogias
Copy link
Collaborator Author

I am not sure I fully understand the concern. I see the difference before and after the refactoring, but it is not clear to me why in this example A will not wake up B. Is the problem that B will never be scheduled or that it will not be scheduled as early as it could?

I think that given that we call resolve() at the very end of the function, none of the behaviours are run before this function finishes.

@marioskogias
Copy link
Collaborator Author

@mjp41 @vishalgupta97 I made some changes now set the read_available before the release. I also tried to consolidate and remove duplicate code with the handle_read_only_enqueue method.

Please take a look and let me know what you think.

@vishalgupta97
Copy link
Contributor

LGTM

Copy link
Member

@mjp41 mjp41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -759,6 +793,9 @@ namespace verona::rt
Slot* last_slot;
size_t transfer_count;
bool had_no_predecessor;
// The last two are only use for reads only chains
size_t ref_count;
size_t ex_count;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be merged with the ec array? Or the ec array be removed and just use this?

@marioskogias marioskogias merged commit 758da83 into microsoft:main Nov 6, 2024
8 checks passed
@marioskogias marioskogias deleted the refactor-schedule-many branch November 6, 2024 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants