Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added functionality to create consumption gpu app #8399

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

harryli0108
Copy link
Member

@harryli0108 harryli0108 commented Jan 6, 2025


This checklist is used to make sure that common guidelines for a pull request are followed.

Related command

az containerapp create

General Guidelines

  • Have you run azdev style <YOUR_EXT> locally? (pip install azdev required)
  • Have you run python scripts/ci/test_index.py -q locally? (pip install wheel==0.30.0 required)
  • My extension version conforms to the Extension version schema

For new extensions:

About Extension Publish

There is a pipeline to automatically build, upload and publish extension wheels.
Once your pull request is merged into main branch, a new pull request will be created to update src/index.json automatically.
You only need to update the version information in file setup.py and historical information in file HISTORY.rst in your PR but do not modify src/index.json.

Copy link

Validation for Breaking Change Starting...

Thanks for your contribution!

Copy link

Hi @harryli0108,
Please write the description of changes which can be perceived by customers into HISTORY.rst.
If you want to release a new extension version, please update the version in setup.py as well.

Copy link

Hi @harryli0108,
Since the current milestone time is less than 7 days, this pr will be reviewed in the next milestone.

@yonzhan
Copy link
Collaborator

yonzhan commented Jan 6, 2025

Thank you for your contribution! We will review the pull request and get back to you soon.

Copy link

github-actions bot commented Jan 6, 2025

🚫All pull requests will be blocked to merge until Jan 6, 2025 due to CCOA

Copy link

github-actions bot commented Jan 6, 2025

CodeGen Tools Feedback Collection

Thank you for using our CodeGen tool. We value your feedback, and we would like to know how we can improve our product. Please take a few minutes to fill our codegen survey

Copy link

github-actions bot commented Jan 6, 2025

Hi @harryli0108

Release Suggestions

Module: containerapp

  • Update VERSION to 1.1.0b2 in src/containerapp/setup.py

Notes

@@ -698,6 +708,47 @@ def set_up_system_assigned_identity_as_default_if_using_acr(self):
return
self.set_argument_registry_identity('system')

def set_up_consumption_gpu_wp_payload(self, consumption_gpu_profile_type):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_up_consumption_gpu_wp_payload

In validate_consumption_gpu_profile you also allow A10, but here you don't have that. Both A10 and NC12-A100 are not skus we support today, but I'm guessing you are future-proofing here.


def validate_consumption_gpu_profile(self):
if self.get_argument_enable_consumption_gpu() is not None:
if self.get_argument_enable_consumption_gpu().lower() not in ["consumption-gpu-nc8as-t4", "consumption-gpu-nc4as-t4", "consumption-gpu-nc24-a100", "consumption-gpu-nc12-a100", "consumption-gpu-nv6ads-a10"]:
Copy link
Contributor

@Tratcher Tratcher Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to the params and use the built-in validation?
c.argument('consumption_gpu_profile', arg_type=get_enum_type(['consumption-gpu-nc8as-t4', 'consumption-gpu-nc4as-t4', 'consumption-gpu-nc24-a100'...
Example:

c.argument('revisions_mode', arg_type=get_enum_type(['single', 'multiple', 'labels']), help="The active revisions mode for the container app.")

@Greedygre
Copy link
Contributor

/azp run

Copy link

Pull request contains merge conflicts.

Comment on lines +712 to +732
consumption_gpu_profile_type_lower = consumption_gpu_profile_type.lower()
if consumption_gpu_profile_type_lower == "consumption-gpu-nc8as-t4":
payload = {
"workloadProfileType": "Consumption-GPU-NC8as-T4",
"name": "consumption-8core-t4"
}
elif consumption_gpu_profile_type_lower == "consumption-gpu-nc4as-t4":
payload = {
"workloadProfileType": "Consumption-GPU-NC4as-T4",
"name": "consumption-4core-t4"
}
elif consumption_gpu_profile_type_lower == "consumption-gpu-nc24-a100":
payload = {
"workloadProfileType": "Consumption-GPU-NC24-A100",
"name": "consumption-24core-a100"
}
elif consumption_gpu_profile_type_lower == "consumption-gpu-nc12-a100":
payload = {
"workloadProfileType": "Consumption-GPU-NC12-A100",
"name": "consumption-12core-a100"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently when we want to support a new type of workload profile, we don't need to update CLI.
These hard codes are coupled to the workload profile type, which is not conducive to future expansion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @harryli0108 @Tratcher

Do we really need add functionality to create gpu app for az containerapp create?
Currently we use az containerapp create --workload-profile-name to specify the workload profile to run the app on.
This --consumption-gpu-profile has conflict with it.

Copy link
Contributor

@Greedygre Greedygre Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harryli0108
I think it would be better to use az containerapp up for adding functionality to create gpu app.
We can prepare the managed environment workload profile in az containerapp up prepare logic.

Command
    az containerapp up : Create or update a container app as well as any associated resources (ACR,
    resource group, container apps environment, GitHub Actions, etc.).

if consumption_gpu_wp_name is None:
env_client = self.get_environment_client
wp_payload, consumption_gpu_wp_name = self.update_consumption_gpu_wp(managed_env_info, consumption_gpu_wp)
env_client().update(cmd=self.cmd, resource_group_name=managed_env_rg, name=managed_env_name, managed_environment_envelope=wp_payload)
Copy link
Contributor

@Greedygre Greedygre Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we update environment here, the --no-wait will not work for this case, because we have to wait for the update to complete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should warning customer the --no-wait not take effect for this case:

--no-wait will not take effect when using --consumption-gpu-profile and need to create a workload profile.

consumption_gpu_wp_name = wp["name"]
break
if consumption_gpu_wp_name is None:
env_client = self.get_environment_client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
env_client = self.get_environment_client
env_client = self.get_environment_client()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants