-
Notifications
You must be signed in to change notification settings - Fork 543
Feature/add aws support to ai inference pkb v1.1 #6326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Feature/add aws support to ai inference pkb v1.1 #6326
Conversation
Add precommit pyink formatter
| - key: karpenter.sh/capacity-type | ||
| operator: In | ||
| values: {{ gpu_capacity_types | default(['on-demand']) }} | ||
| values: {{ gpu_capacity_types | default(['spot','on-demand']) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean either type is acceptable? This seems like it would introduce an element of variability in tests.. like was a given run slower because it was spot or just slower? Let's prefer to just set gpu_capacity_types = ['spot'] using I believe flags which are then also recorded in metadata.
..ok looking in wg_serving_inference_server I see the values are hardcoded passed in as well.. Let's instead rely on the aws_spot_instances flag value & indeed include that flag value in metadata. This is set in Kubernetes Cluster EksAutoCluster as use_spot but could be set in the Eks Cluster or perhaps even KubernetesCluster more generally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I agree this adds variability.
I changed the default to on-demand and now select only one capacity type based on the aws_spot_instances flag (spot when true, on-demand otherwise).
The value of aws_spot_instances is also recorded in the run metadata.
| gpu_capacity_types=['spot','on-demand'], | ||
| gpu_arch=['amd64'], | ||
| gpu_instance_families=['g6', 'g6e'], | ||
| gpu_instance_families=['g6','p5'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roughly the same note as spot vs on-demand.. For wg_serving I think this info is encoded in self.accelerator_type. Now admittedly that doesn't support multiple gpu types, so that could be a possible addition but that again also introduces variability.
I wonder if there's a way to tell which gpu type was actually used & record that info in metadata? seems like an important prerequisite if we do want to use multiple gpu types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed.
I added best-effort collection of node information from the scheduled pod and store it in the run metadata.
This includes node name and instance type (and GPU product when available), so it is clear which instance was actually used even when multiple families are allowed.
| - repo: https://github.com/google/pyink | ||
| rev: 24.10.1 | ||
| hooks: | ||
| - id: pyink |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very helpful. Can you send in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.