Skip to content

mbaijal/sagemaker-controller

 
 

Repository files navigation

ACK service controller for Amazon SageMaker

This repository contains source code for the AWS Controllers for Kubernetes (ACK) service controller for Amazon SageMaker.

Please log issues and feedback on the main AWS Controllers for Kubernetes Github project.

Contributing

We welcome community contributions and pull requests.

See our contribution guide for more information on how to report issues, set up a development environment, and submit code.

We adhere to the Amazon Open Source Code of Conduct.

You can also learn more about our Governance structure.

License

This project is licensed under the Apache-2.0 License.

Supported SageMaker Resources

For a list of supported resources, refer to the SageMaker API Reference.

Find the helm charts and controller images on Amazon ECR Public Gallery.

Getting Started

The following sections will guide you to install SageMaker and Application Autoscaling controllers.

1.0 Pre-requisites

This guide assumes that you’ve the following prerequisites:

  • Installed the following tools on the client machine used to access your Kubernetes cluster:
    • kubectl - A command line tool for working with Kubernetes clusters.
    • helm - A tool for installing and managing Kubernetes applications
    • AWS CLI - A command line tool for interacting with AWS services.
    • eksctl - A command line tool for working with EKS clusters that automates many individual tasks.
    • yq - command-line YAML processor.
      • Linux
        sudo wget https://github.com/mikefarah/yq/releases/download/v4.9.8/yq_linux_amd64 -O /usr/bin/yq
        sudo chmod +x /usr/bin/yq
        
      • Mac
        brew install yq
        
  • Have IAM permissions to create roles and attach policies to roles.
  • Created an EKS cluster on which to run the controllers. It should be Kubernetes version 1.16+. For automated cluster creation using eksctl, see Create an Amazon EKS Cluster and select eksctl option.

2.0 Setup IAM Role based Authentication

2.1 Ensure you are connected to EKS Cluster

export CLUSTER_NAME=<CLUSTER_NAME>
export AWS_DEFAULT_REGION=<CLUSTER_REGION>

aws eks update-kubeconfig --name $CLUSTER_NAME --region $AWS_DEFAULT_REGION

kubectl config get-contexts
# Ensure cluster has compute
kubectl get nodes

2.1 Setup IRSA for controller pod

Before you can deploy your operator using an IAM role, associate an OpenID Connect (OIDC) provider with your role to authenticate with the IAM service.

2.1.1 Create an OpenID Connect Provider for Your Cluster
eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} \
--region ${AWS_DEFAULT_REGION} --approve

Get the OIDC ID

export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
export OIDC_PROVIDER_URL=$(aws eks describe-cluster --name $CLUSTER_NAME --region $AWS_DEFAULT_REGION \
--query "cluster.identity.oidc.issuer" --output text | cut -c9-)
2.1.2 Create an IAM Role

Create a file named trust.json and insert the following trust relationship code block required for IAM role into it.

printf '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::'$AWS_ACCOUNT_ID':oidc-provider/'$OIDC_PROVIDER_URL'"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "'$OIDC_PROVIDER_URL':aud": "sts.amazonaws.com",
          "'$OIDC_PROVIDER_URL':sub": [
            "system:serviceaccount:ack-system:ack-sagemaker-controller",
            "system:serviceaccount:ack-system:ack-applicationautoscaling-controller"
          ]
        }
      }
    }
  ]
}
' > ./trust.json

Updating an ApplicationAutoscaling ScalableTarget requires the following permissions. Create a file named pass_role_policy.json to create the policy required for the IAM role.

printf '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "*"
    }
  ]
}
' > ./pass_role_policy.json

Run the following command to create a role with the trust relationship defined in trust.json. This role enables the Amazon EKS cluster to get and refresh credentials from IAM.

OIDC_ROLE_NAME=ack-controller-role-$CLUSTER_NAME

aws --region $AWS_DEFAULT_REGION iam create-role --role-name $OIDC_ROLE_NAME --assume-role-policy-document file://trust.json

# Attach the AmazonSageMakerFullAccess Policy to the Role. This policy provides full access to 
# Amazon SageMaker. Also provides select access to related services (e.g., Application Autoscaling,
# S3, ECR, CloudWatch Logs).
aws --region $AWS_DEFAULT_REGION iam attach-role-policy --role-name $OIDC_ROLE_NAME --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess

# Attach the iam:PassRole policy required for updating ApplicationAutoscaling ScalableTarget
aws iam put-role-policy --role-name $OIDC_ROLE_NAME --policy-name "iam-pass-role-policy" --policy-document file://pass_role_policy.json

export IAM_ROLE_ARN_FOR_IRSA=$(aws --region $AWS_DEFAULT_REGION iam get-role --role-name $OIDC_ROLE_NAME --output text --query 'Role.Arn')
echo $IAM_ROLE_ARN_FOR_IRSA

Take note of IAM_ROLE_ARN_FOR_IRSA printed in the previous step; you will pass this value to the service account used by the controller.

3.0 Install Controllers

3.1 Install SageMaker Controller

3.1.1 Download helm chart
export HELM_EXPERIMENTAL_OCI=1
export SERVICE=sagemaker
export RELEASE_VERSION=v0.1.0
export CHART_EXPORT_PATH=/tmp/chart
export CHART_REPO=public.ecr.aws/aws-controllers-k8s/$SERVICE-chart
export CHART_REF=$CHART_REPO:$RELEASE_VERSION

mkdir -p $CHART_EXPORT_PATH
helm chart pull $CHART_REF
helm chart list
helm chart export $CHART_REF --destination $CHART_EXPORT_PATH
3.1.2 Choose one of the two options for deployment
  • [Option 1] Cluster scoped deployment
    • # Update values in helm chart
      cd $CHART_EXPORT_PATH/$SERVICE-chart
      yq e '.aws.region = env(AWS_DEFAULT_REGION)' -i values.yaml
      yq e '.aws.account_id = env(AWS_ACCOUNT_ID)' -i values.yaml
      yq e '.serviceAccount.annotations."eks.amazonaws.com/role-arn" = env(IAM_ROLE_ARN_FOR_IRSA)' -i values.yaml
      cd -
  • [Option 2] Namespace scoped deployment
    • The controller will watch for the resources in the helm chart release namespace. In this guide, that value is set from the $ACK_K8S_NAMESPACE variable in helm install section 3.1.3
    • # Update values in helm chart
      cd $CHART_EXPORT_PATH/$SERVICE-chart
      yq e '.aws.region = env(AWS_DEFAULT_REGION)' -i values.yaml
      yq e '.aws.account_id = env(AWS_ACCOUNT_ID)' -i values.yaml
      yq e '.serviceAccount.annotations."eks.amazonaws.com/role-arn" = env(IAM_ROLE_ARN_FOR_IRSA)' -i values.yaml
      yq e '.installScope = "namespace"' -i values.yaml
      cd -
3.1.3 Install Controller

Install CRDs

kubectl apply -f $CHART_EXPORT_PATH/$SERVICE-chart/crds

Create a namespace and install the helm chart

export ACK_K8S_NAMESPACE=ack-system
helm install -n $ACK_K8S_NAMESPACE --create-namespace --skip-crds ack-$SERVICE-controller \
 $CHART_EXPORT_PATH/$SERVICE-chart

Verify CRDs and helm charts were deployed

kubectl get crds | grep "k8s.aws"

kubectl get pods -n $ACK_K8S_NAMESPACE

Jump to Section 4.0 if you only wish to install SageMaker controller

3.2 ApplicationAutoscaling

3.2.1 Download helm chart
export HELM_EXPERIMENTAL_OCI=1
export SERVICE=applicationautoscaling
export RELEASE_VERSION=v0.1.0
export CHART_EXPORT_PATH=/tmp/chart
export CHART_REPO=public.ecr.aws/aws-controllers-k8s/$SERVICE-chart
export CHART_REF=$CHART_REPO:$RELEASE_VERSION

mkdir -p $CHART_EXPORT_PATH
helm chart pull $CHART_REF
helm chart list
helm chart export $CHART_REF --destination $CHART_EXPORT_PATH
3.2.2 Choose one of the two options for deployment

Run the steps in section 3.1.2

3.2.3 Install Controller

Run the steps in section 3.1.3

4.0 Samples

4.1 SageMaker samples

Head over to the samples directory and follow the README to create resources.

4.2 Application-autoscaling samples

Head over to the samples directory in application-autoscaling controller repository and follow the README to create resources.

Note: these samples will only work if you installed application autoscaling controller in Section 3.2

5.0 Cross Region Resource Management

To determine which region the resources should be created, ACK controllers will, in order, look for a region in the following sources:

  1. Region annotation services.k8s.aws/region on the resource. If provided it will override the namespace default region annotation.
  2. Namespace default region annotation services.k8s.aws/default-region
  3. If none of the two annotations are provided ACK will try to find a region from these sources:
    1. Controller flags i.e. aws.region in helm charts
    2. Pod IRSA environment variables

for example, the controller default region is us-west-2 (3.a/3.b) and you want to create resource in us-east-1. Use the one of the following options to override the default region

  • [Option 1] Region annotation sample

    • Add the services.k8s.aws/region annotation while creating the resource. For example:
    • apiVersion: sagemaker.services.k8s.aws/v1alpha1
      kind: TrainingJob
      metadata:
        name: ack-sample-trainingjob
        annotations:
          services.k8s.aws/region: us-east-1
      spec:
        trainingJobName: ack-sample-trainingjob
        roleARN: <sagemaker_execution_role_arn>
        ...
  • [Option 2] Namespace default region annotation sample

    • Note: Namespaced scope deployment does not support this option

    • To bind a region to a specific Namespace you will have to annotate the Namespace with services.k8s.aws/default-region annotation. For example:

    • apiVersion: v1
      kind: Namespace
      metadata:
        name: production
        annotations:
          services.k8s.aws/default-region: us-east-1
    • For existing namespaces you can also run:

      • kubectl annotate namespace production services.k8s.aws/default-region=us-east-1
    • Create the resource in the same namespace

      • apiVersion: sagemaker.services.k8s.aws/v1alpha1
        kind: TrainingJob
        metadata:
          name: ack-sample-trainingjob
          namespace: production
        spec:
          trainingJobName: ack-sample-trainingjob
          roleARN: <sagemaker_execution_role_arn>
          ...

6.0 Cross Account Resource Management

ACK service controllers can manage resources in different AWS accounts. To enable and start using this feature, you will need to:

  1. Configure your AWS accounts, where the resources will be managed.
  2. Deploy ACK service controller in Cluster scope
  • Namespaced scope deployment does not support Cross Account Resource Management
  1. Create a ConfigMap to map AWS accounts with the Role ARNs that needs to be assumed
  2. Annotate namespaces with AWS Account IDs

For detailed information about how ACK service controllers manage resource in multiple AWS accounts, please refer to CARM design document.

6.1 Setting up AWS accounts

AWS Account administrators should create/configure IAM roles to allow ACK service controllers to assume Roles in different AWS accounts. For example, to allow account A (000000000000) to create resources in account B (111111111111) and you have configured the controller to use arn:aws:iam::000000000000:role/roleA-production role

Using account A credentials

export POLICY="{\"Version\":\"2012-10-17\",\"Statement\":{\"Effect\":\"Allow\",\"Action\":\"sts:AssumeRole\",\"Resource\":\"*\"}}"
aws iam put-role-policy --role-name roleA-production \
  --policy-name sts-assumerole --policy-document "$POLICY"

Using account B credentials

export CARM_ROLE_NAME=SagemakerCrossAccountAccess
export TRUST="{ \"Version\": \"2012-10-17\", \"Statement\": [ { \"Effect\": \"Allow\", \"Principal\": { \"AWS\": \"arn:aws:iam::000000000000:role/roleA-production\" }, \"Action\": \"sts:AssumeRole\" } ] }"
aws iam create-role --role-name ${CARM_ROLE_NAME} \
  --assume-role-policy-document "$TRUST"
aws iam attach-role-policy --role-name ${CARM_ROLE_NAME} \
  --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess

6.2 Map AWS Accounts with their associated Role ARNs

Create a ConfigMap named ack-role-account-map in the namespace controller is installed. This ConfigMap will be used to associate each AWS Account ID with the role ARN that needs be assumed, in order to manage resources in that particular account. For example:

apiVersion: v1
kind: ConfigMap
metadata:
  name: ack-role-account-map
  namespace: ack-system
data:
  "111111111111": arn:aws:iam::111111111111:role/SagemakerCrossAccountAccess

6.3 Bind accounts to namespaces

To bind AWS accounts to a specific Namespace you will have to annotate the Namespace with an AWS Account ID. For example:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  annotations:
    services.k8s.aws/owner-account-id: 111111111111

For existing namespaces you can also run:

kubectl annotate namespace production services.k8s.aws/owner-account-id=111111111111

6.4 Create resource in different AWS account

Now to create resources in account B, you will have to create your resources in the associated Namespace. For example:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: TrainingJob
metadata:
  name: ack-sample-tainingjob
  namespace: production
spec:
  trainingJobName: ack-sample-tainingjob
  roleARN: <sagemaker_execution_role_arn>
  ...

8.0 Adopt Resources

ACK controller provides to provide the ability to “adopt” resources that were not originally created by ACK service controller. Given the user configures the controller with permissions which has access to existing resource, the controller will be able to determine the current specification and status of the AWS resource and reconcile said resource as if the ACK controller had originally created it.

Sample:

apiVersion: services.k8s.aws/v1alpha1
kind: AdoptedResource
metadata:
  name: adopt-endpoint-sample
spec:  
  aws:
    # resource to adopt, not created by ACK
    nameOrID: xgboost-endpoint
  kubernetes:
    group: sagemaker.services.k8s.aws
    kind: Endpoint
    metadata:
      # target K8s CR name
      name: xgboost-endpoint

Save the above to a file name adopt-endpoint-sample.yaml.

Submit the CR

kubectl apply -f adopt-endpoint-sample.yaml

Check for ACK.Adopted condition to be true under status.conditions

kubectl describe adoptedresource adopt-endpoint-sample

Output should look similar to this:

---
kind: AdoptedResource
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"services.k8s.aws/v1alpha1","kind":"AdoptedResource","metadata":{"annotations":{},"name":"xgboost-endpoint","namespace":"default"},"spec":{"aws":{"nameOrID":"xgboost-endpoint"},"kubernetes":{"group":"sagemaker.services.k8s.aws","kind":"Endpoint","metadata":{"name":"xgboost-endpoint"}}}}'
  creationTimestamp: '2021-04-27T02:49:14Z'
  finalizers:
  - finalizers.services.k8s.aws/AdoptedResource
  generation: 1
  name: adopt-endpoint-sample
  namespace: default
  resourceVersion: '12669876'
  selfLink: "/apis/services.k8s.aws/v1alpha1/namespaces/default/adoptedresources/adopt-endpoint-sample"
  uid: 35f8fa92-29dd-4040-9d0d-0b07bbd7ca0b
spec:
  aws:
    nameOrID: xgboost-endpoint
  kubernetes:
    group: sagemaker.services.k8s.aws
    kind: Endpoint
    metadata:
      name: xgboost-endpoint
status:
  conditions:
  - status: 'True'
    type: ACK.Adopted

Check resource exists in cluster

kubectl describe endpoints.sagemaker xgboost-endpoint

9.0 Cleanup

Few crds are common across services like services.k8s.aws_adoptedresources.yaml. If you have multiple controllers installed, you should be not delete the common CRDs unless you are uninstalling all the controllers.

9.1 Uninstall SageMaker controller and crds

export SERVICE=sagemaker
# Uninstall the Helm Chart
helm uninstall -n $ACK_K8S_NAMESPACE ack-$SERVICE-controller

# Delete the CRDs
cd $CHART_EXPORT_PATH/$SERVICE-chart/crds

$ ls
sagemaker.services.k8s.aws_dataqualityjobdefinitions.yaml
sagemaker.services.k8s.aws_endpointconfigs.yaml
sagemaker.services.k8s.aws_endpoints.yaml
sagemaker.services.k8s.aws_hyperparametertuningjobs.yaml
sagemaker.services.k8s.aws_modelbiasjobdefinitions.yaml
sagemaker.services.k8s.aws_modelexplainabilityjobdefinitions.yaml
sagemaker.services.k8s.aws_modelqualityjobdefinitions.yaml
sagemaker.services.k8s.aws_models.yaml
sagemaker.services.k8s.aws_monitoringschedules.yaml
sagemaker.services.k8s.aws_processingjobs.yaml
sagemaker.services.k8s.aws_trainingjobs.yaml
sagemaker.services.k8s.aws_transformjobs.yaml
services.k8s.aws_adoptedresources.yaml. # -> Common CRD across services

Choose either of the options below to delete CRDs

  • [Option 1] If you have multiple controllers installed and want to delete CRDs only related to sagemaker resources
    • kubectl delete -f <CRDs which have the prefix applicationautoscaling.>
      
  • [Option 2] If you want to delete all CRDs
    • kubectl delete -f $CHART_EXPORT_PATH/$SERVICE-chart/crds
      

9.2 Uninstall applicationautoscaling controller and CRDs

Skip this section if you only installed SageMaker controller

export SERVICE=applicationautoscaling
# Uninstall the Helm Chart
helm uninstall -n $ACK_K8S_NAMESPACE ack-$SERVICE-controller

# Delete the CRDs
cd $CHART_EXPORT_PATH/$SERVICE-chart/crds
$ ls
applicationautoscaling.services.k8s.aws_scalabletargets.yaml
applicationautoscaling.services.k8s.aws_scalingpolicies.yaml
services.k8s.aws_adoptedresources.yaml # -> Common CRD across services

Choose either of the options below to delete CRDs

  • [Option 1] If you have multiple controllers installed and want to delete CRDs only related to applicationautoscaling resources
    • kubectl delete -f <CRDs which have the prefix applicationautoscaling.>
      
  • [Option 2] If you want to delete all CRDs
    • kubectl delete -f $CHART_EXPORT_PATH/$SERVICE-chart/crds
      

9.3 Verify charts were deleted

helm ls -n $ACK_K8S_NAMESPACE

# Delete the namespace
kubectl delete namespace $ACK_K8S_NAMESPACE

[Optional] If you used cross account resource management

kubectl delete -n ack-system configmap ack-role-account-map
kubectl delete namespace production

About

ACK service controller for Amazon SageMaker

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 50.3%
  • Go 45.6%
  • Shell 1.9%
  • Smarty 1.7%
  • Makefile 0.5%