Master Gives Up When Scheduling Workers #723
kmg-stripe
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello @Andyz26 ! We are investigating an issue where a resource cluster doesn't have enough resources to assign workers to task executors. Our ASG-based resource provider will scale-up, but the scheduler gives up before all of the nodes are up. We see log messages for retrying the assignments, but see the scheduler gives up before exhausting all retries.
I see you recently merged this: 7f996f8
We originally thought that max retries was too low and our problem would go away when deploying an artifact with your change, but we still see the scheduler giving up without exhausting retries. Just checking to see if you happen to be investigating something similar...
cc: @codyrioux
Beta Was this translation helpful? Give feedback.
All reactions