Feature/add aws support to ai inference pkb v1.1 #6326

kiryl-filatau · 2025-12-30T16:24:43Z

No description provided.

Add precommit pyink formatter

hubatish · 2026-01-06T17:55:01Z

perfkitbenchmarker/data/container/kubernetes_ai_inference/aws-gpu-nodepool.yaml.j2

        - key: karpenter.sh/capacity-type
          operator: In
-          values: {{ gpu_capacity_types | default(['on-demand']) }}
+          values: {{ gpu_capacity_types | default(['spot','on-demand']) }}


Does this mean either type is acceptable? This seems like it would introduce an element of variability in tests.. like was a given run slower because it was spot or just slower? Let's prefer to just set gpu_capacity_types = ['spot'] using I believe flags which are then also recorded in metadata.

..ok looking in wg_serving_inference_server I see the values are hardcoded passed in as well.. Let's instead rely on the aws_spot_instances flag value & indeed include that flag value in metadata. This is set in Kubernetes Cluster EksAutoCluster as use_spot but could be set in the Eks Cluster or perhaps even KubernetesCluster more generally.

Good point. I agree this adds variability.
I changed the default to on-demand and now select only one capacity type based on the aws_spot_instances flag (spot when true, on-demand otherwise).
The value of aws_spot_instances is also recorded in the run metadata.

hubatish · 2026-01-06T17:59:46Z

perfkitbenchmarker/resources/kubernetes/wg_serving_inference_server.py

+          gpu_capacity_types=['spot','on-demand'],
          gpu_arch=['amd64'],
-          gpu_instance_families=['g6', 'g6e'],
+          gpu_instance_families=['g6','p5'],


Roughly the same note as spot vs on-demand.. For wg_serving I think this info is encoded in self.accelerator_type. Now admittedly that doesn't support multiple gpu types, so that could be a possible addition but that again also introduces variability.

I wonder if there's a way to tell which gpu type was actually used & record that info in metadata? seems like an important prerequisite if we do want to use multiple gpu types.

Yes, agreed.
I added best-effort collection of node information from the scheduled pod and store it in the run metadata.
This includes node name and instance type (and GPU product when available), so it is clear which instance was actually used even when multiple families are allowed.

hubatish · 2026-01-06T18:00:18Z

.pre-commit-config.yaml

+  - repo: https://github.com/google/pyink
+    rev: 24.10.1
+    hooks:
+      - id: pyink


This looks very helpful. Can you send in a separate PR?

Sure, here it is
https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/pull/6336/files

…ails

vofish and others added 11 commits July 25, 2025 10:50

Add precommit pyink formatter

46874e8

Add newline

fc73d56

Update rev., add args

9e4dfcb

Update pre-commit hook args

bde158e

Merge pull request #3 from kiryl-filatau/add-precommit

95718e2

Add precommit pyink formatter

Merge branch 'GoogleCloudPlatform:master' into master

c1da28f

Merge branch 'GoogleCloudPlatform:master' into master

867ccfb

Merge branch 'GoogleCloudPlatform:master' into master

dc180f1

Merge branch 'GoogleCloudPlatform:master' into master

b33c2d1

Merge branch 'GoogleCloudPlatform:master' into master

ee6e40d

add h100 and spot instances to nodepool

be90306

hubatish reviewed Jan 6, 2026

View reviewed changes

kiryl-filatau added 2 commits January 7, 2026 18:28

minor comment text edit

67bb2f4

kubernetes_ai_inference: make spot opt-in and record runtime node det…

c48a682

…ails

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/add aws support to ai inference pkb v1.1 #6326

Feature/add aws support to ai inference pkb v1.1 #6326

Uh oh!

kiryl-filatau commented Dec 30, 2025

Uh oh!

hubatish Jan 6, 2026

Uh oh!

kiryl-filatau Jan 13, 2026

Uh oh!

hubatish Jan 6, 2026

Uh oh!

kiryl-filatau Jan 13, 2026

Uh oh!

hubatish Jan 6, 2026

Uh oh!

kiryl-filatau Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feature/add aws support to ai inference pkb v1.1 #6326

Are you sure you want to change the base?

Feature/add aws support to ai inference pkb v1.1 #6326

Uh oh!

Conversation

kiryl-filatau commented Dec 30, 2025

Uh oh!

hubatish Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

kiryl-filatau Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

hubatish Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

kiryl-filatau Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

hubatish Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

kiryl-filatau Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants