Skip to content

Conversation

@kiryl-filatau
Copy link
Collaborator

No description provided.

- key: karpenter.sh/capacity-type
operator: In
values: {{ gpu_capacity_types | default(['on-demand']) }}
values: {{ gpu_capacity_types | default(['spot','on-demand']) }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean either type is acceptable? This seems like it would introduce an element of variability in tests.. like was a given run slower because it was spot or just slower? Let's prefer to just set gpu_capacity_types = ['spot'] using I believe flags which are then also recorded in metadata.

..ok looking in wg_serving_inference_server I see the values are hardcoded passed in as well.. Let's instead rely on the aws_spot_instances flag value & indeed include that flag value in metadata. This is set in Kubernetes Cluster EksAutoCluster as use_spot but could be set in the Eks Cluster or perhaps even KubernetesCluster more generally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I agree this adds variability.
I changed the default to on-demand and now select only one capacity type based on the aws_spot_instances flag (spot when true, on-demand otherwise).
The value of aws_spot_instances is also recorded in the run metadata.

gpu_capacity_types=['spot','on-demand'],
gpu_arch=['amd64'],
gpu_instance_families=['g6', 'g6e'],
gpu_instance_families=['g6','p5'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roughly the same note as spot vs on-demand.. For wg_serving I think this info is encoded in self.accelerator_type. Now admittedly that doesn't support multiple gpu types, so that could be a possible addition but that again also introduces variability.

I wonder if there's a way to tell which gpu type was actually used & record that info in metadata? seems like an important prerequisite if we do want to use multiple gpu types.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed.
I added best-effort collection of node information from the scheduled pod and store it in the run metadata.
This includes node name and instance type (and GPU product when available), so it is clear which instance was actually used even when multiple families are allowed.

- repo: https://github.com/google/pyink
rev: 24.10.1
hooks:
- id: pyink
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very helpful. Can you send in a separate PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants