Skip to content

Webinar Demo Flow

Arun Gupta edited this page Jul 21, 2019 · 15 revisions

KubeFlow demo

  • Explain kfapp/aws_config/cluster_config.sh for GPUs
  • Explain kfapp/aws_config/cluster_features.sh for private access, disable endpoint, control/data plane logging
  • In kfapp/env.sh, explain KUBEFLOW_COMPONENTS and disabling of ALB and ingress controllers
  • Show config:
    kubectl config get-contexts
    
  • Show GPUs
    kubectl get nodes "-o=custom-columns=NAME:.metadata.name,MEMORY:.status.allocatable.memory,CPU:.status.allocatable.cpu,GPU:.status.allocatable.nvidia\.com/gpu"
    

Jupyter notebook

Single node training

  • Follow the steps from https://github.com/aws-samples/machine-learning-using-k8s/blob/master/docs/mnist/inference/tensorflow.md to run inference engine
  • Deploy serving components:
    ks apply default -c ${TF_SERVING_SERVICE}
    ks apply default -c ${TF_SERVING_DEPLOYMENT}
    
  • Show that the pods are running:
    kubectl get pods -n kubeflow --selector=app=mnist
    
  • Do port forward
    kubectl port-forward -n kubeflow `kubectl get pods -n kubeflow --selector=app=mnist -o jsonpath='{.items[0].metadata.name}' --field-selector=status.phase=Running` 8500:8500
    
  • Run inference:
    python samples/mnist/inference/tensorflow/inference_client.py --endpoint http://localhost:8500/v1/models/mnist:predict
    
  • Delete serving components:
    ks delete -c ${TF_SERVING_DEPLOYMENT}
    

Distributed training

Optional

  • TensorBoard
  • Katib
  • Fairing
  • KFServing
Clone this wiki locally