Skip to content

redhat-na-ssa/ocp-rhoai-deploy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenShift AI Deployment

Notes for offline OCP + RHOAI install

See README

Setup OCP web terminal

# apply enhanced web terminal
oc apply -k https://github.com/redhat-na-ssa/ocp-rhoai-deploy/demo/web-terminal

# delete old terminal-tooling
$(wtoctl | grep delete)

Setup worker node in AWS

ocp_machineset_scale 1
ocp_control_nodes_not_schedulable
oc apply -k ../demo_ops/components/cluster-configs/autoscale/overlays/default

Setup GPU node in AWS

# setup gpu node
ocp_aws_machineset_create_gpu
ocp_machineset_scale 1

Apply device-plugin-config per node

# view configs
oc describe cm device-plugin-config \
  -n nvidia-gpu-operator

# apply config per node
DEVICE_CONFIG=time-sliced-4

oc label node "${NODE_NAME}" \
  --overwrite \
  nvidia.com/device-plugin.config="${DEVICE_CONFIG}"

Setup Mig profile on node

See Nvidia Docs - MIG

# patch gpu cluster policy
patch clusterpolicies.nvidia.com/cluster-policy \
  --type='json' \
  -p='[{"op":"replace", "path":"/spec/mig/strategy", "value":"single"}]'

# get mig profiles
oc -n nvidia-gpu-operator \
  describe cm default-mig-parted-config

# label a node with a mig profile
oc label nodes \
  <node-name> \
  nvidia.com/mig.config=all-1g.10gb --overwrite

Additional Info

About

Notes for offline OCP + RHOAI install

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •