Skip to content

Latest commit

 

History

History
223 lines (196 loc) · 9.79 KB

installing-the-distributed-workloads-components.adoc

File metadata and controls

223 lines (196 loc) · 9.79 KB

Installing the distributed workloads components

To use the distributed workloads feature in {productname-short}, you must install several components.

Prerequisites
Procedure
  1. Click the Data Science Cluster tab.

  2. Click the default instance name (for example, default-dsc) to open the instance details page.

  3. Click the YAML tab to show the instance specifications.

  4. Enable the required distributed workloads components. In the spec.components section, set the managementState field correctly for the required components:

    • If you want to use the CodeFlare framework to tune models, enable the codeflare, kueue, and ray components.

    • If you want to use the Kubeflow Training Operator to tune models, enable the kueue and trainingoperator components.

    • The list of required components depends on whether the distributed workload is run from a pipeline or notebook or both, as shown in the following table.

    Table 1. Components required for distributed workloads
    Component Pipelines only Notebooks only Pipelines and notebooks

    codeflare

    Managed

    Managed

    Managed

    dashboard

    Managed

    Managed

    Managed

    datasciencepipelines

    Managed

    Removed

    Managed

    kueue

    Managed

    Managed

    Managed

    ray

    Managed

    Managed

    Managed

    trainingoperator

    Managed

    Managed

    Managed

    workbenches

    Removed

    Managed

    Managed

  5. Click Save. After a short time, the components with a Managed state are ready.

Verification

Check the status of the codeflare-operator-manager, kuberay-operator, and kueue-controller-manager pods, as follows:

  1. Click WorkloadsDeployments.

  2. Search for the codeflare-operator-manager, kuberay-operator, and kueue-controller-manager deployments. In each case, check the status as follows:

    1. Click the deployment name to open the deployment details page.

    2. Click the Pods tab.

    3. Check the pod status.

      When the status of the codeflare-operator-manager-<pod-id>, kuberay-operator-<pod-id>, and kueue-controller-manager-<pod-id> pods is Running, the pods are ready to use.

    4. To see more information about each pod, click the pod name to open the pod details page, and then click the Logs tab.

Next Step

Configure the distributed workloads feature as described in Managing distributed workloads.