-
Notifications
You must be signed in to change notification settings - Fork 7k
[RLLib] Fix some RLlib release tests #59288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Kamil Kaczmarek <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates several RLlib release tests to run on the ray-ml image and assigns them to the rllib team, which aligns with the goal of fixing and standardizing RLlib release tests. The changes are generally good, but I've found a couple of potential misconfigurations where tests are set up to use GPU resources but are assigned to CPU-only clusters. This could lead to test failures. Please see the detailed comments.
pseudo-rnd-thoughts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
Do we need torch/torchvision in the byod_rllib.sh, if we run on ray-ml image?
Possibly not, I just copied the pip installs from a container image I've been using
Signed-off-by: Kamil Kaczmarek <[email protected]>
… kk/fix-rllib-release-tests
Signed-off-by: Kamil Kaczmarek <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
… kk/fix-rllib-release-tests
Description
long_running_impalaandlong_running_many_ppo: moved tests specification toRLlib testssection.tune_rllib_connect_testandlong_running_many_ppo.tune_rllib_connect_test:release/rllib_tests.RLlib testssection.stableflag from RLlib tests.