Master Gives Up When Scheduling Workers #723

kmg-stripe · 2024-11-05T20:59:31Z

kmg-stripe
Nov 5, 2024
Collaborator

Hello @Andyz26 ! We are investigating an issue where a resource cluster doesn't have enough resources to assign workers to task executors. Our ASG-based resource provider will scale-up, but the scheduler gives up before all of the nodes are up. We see log messages for retrying the assignments, but see the scheduler gives up before exhausting all retries.

I see you recently merged this: 7f996f8

We originally thought that max retries was too low and our problem would go away when deploying an artifact with your change, but we still see the scheduler giving up without exhausting retries. Just checking to see if you happen to be investigating something similar...

cc: @codyrioux

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Master Gives Up When Scheduling Workers #723

{{title}}

Replies: 0 comments

Select a reply

Master Gives Up When Scheduling Workers #723

kmg-stripe Nov 5, 2024 Collaborator

Replies: 0 comments

kmg-stripe
Nov 5, 2024
Collaborator