Skip to content

Latest commit

 

History

History
218 lines (190 loc) · 14.3 KB

enabling-data-science-pipelines-2.adoc

File metadata and controls

218 lines (190 loc) · 14.3 KB

Migrating to data science pipelines 2.0

Important

Data science pipelines 2.0 contains an installation of Argo Workflows. {productname-short} does not support direct customer usage of this installation of Argo Workflows.

If there is an existing installation of Argo Workflows that is not installed by data science pipelines on your cluster, data science pipelines will be disabled after you install or upgrade {productname-short}. To enable data science pipelines, remove the separate installation of Argo Workflows from your cluster. Data science pipelines will be enabled automatically.

Argo Workflows resources that are created by {productname-short} have the following labels in the OpenShift Console under Administration > CustomResourceDefinitions, in the argoproj.io group:

 labels:
    app.kubernetes.io/part-of: data-science-pipelines-operator
    app.opendatahub.io/data-science-pipelines-operator: 'true'

Upgrading to data science pipelines 2.0

Note

If you are using GitOps to manage your data science pipelines 1.0 pipeline runs, pause any sync operations related to data science pipelines including PipelineRuns or DataSciencePipelinesApplications (DSPAs) management. After migrating to data science pipelines 2.0, your PipelineRuns will be managed independently of data science pipelines, similar to any other Tekton resources.

  1. Back up your pipelines data.

    1. Create a new data science project.

    2. Configure a new pipeline server.

      Important

      If you use an external database, you must use a different external database than the one you use for data science pipelines 1.0, as the database is migrated to data science pipelines 2.0 format.

    3. Update and recompile your data science pipelines 1.0 pipelines as described in Migrate to Kubeflow Pipelines v2.

      Note

      Data science pipelines 2.0 does not use the kfp-tekton library. In most cases, you can replace usage of kfp-tekton with the kfp library. For data science pipelines 2.0, use the latest version of the KFP SDK. For more information, see the Kubeflow Pipelines SDK API Reference.

      Tip

      You can view historical data science pipelines 1.0 pipeline run information on your primary cluster in the {openshift-platform} Console Developer perspective under PipelinesProjectPipelineRuns.

    4. Import your updated pipelines to the new data science project.

    5. Test and verify your new pipelines.

  2. On your primary cluster, do the following tasks:

    1. Remove your data science pipelines 1.0 pipeline servers.

    2. Optional: Remove your data science pipelines 1.0 resources. For more information, see Removing data science pipelines 1.0 resources.

    3. Recreate the pipeline servers for each data science project where the data science pipelines 1.0 pipeline servers existed.

      Note

      If you are using GitOps to manage your DSPAs, do the following tasks in your DSPAs before performing sync operations:

      • Set spec.dspVersion to v2.

      • Verify that the apiVersion is using v1 instead of v1alpha1.

    4. Import your updated data science pipelines to the applicable pipeline servers.

      Tip

      You can perform a batch upload by creating a script that uses the KFP SDK Client and the .upload_pipeline and .get_pipeline methods.

  3. For any workbenches that communicate with data science pipelines 1.0, do the following tasks in the upgraded instance of {productname-long}:

    1. Delete the existing workbench. For more information, see Deleting a workbench from a data science project.

    2. If you want to use the notebook image version 2024.2, upgrade to Python 3.11 before creating a new workbench.

    3. Create a new workbench that uses the existing persistent storage of the deleted workbench. For more information, see Creating a workbench.

    4. Run the pipeline so that the data science pipelines 2.0 pipeline server schedules it.

Removing data science pipelines 1.0 resources

When your migration to data science pipelines 2.0 is complete on the intermediate cluster, you can clean up the data science pipelines 1.0 resources in your cluster.

Important

Before removing data science pipelines 1.0 resources, ensure that migration of your data science pipelines 1.0 pipelines to 2.0 is complete.

  1. Identify the DataSciencePipelinesApplication (DSPA) resource that corresponds to the data science pipelines 1.0 pipeline server:

    oc get dspa -n <YOUR_DS_PROJECT>
  2. Delete the cluster role binding associated with this DSPA:

    oc delete clusterrolebinding
    ds-pipeline-ui-auth-delegator-<YOUR_DS_PROJECT>-dspa
  3. Delete the DSPA:

    oc delete dspa dspa -n <YOUR_DS_PROJECT>
  4. If necessary, delete the DataSciencePipelinesApplication finalizer to complete the removal of the resource:

    oc patch dspa dspa -n <YOUR_DS_PROJECT> --type=merge
    -p "{\"metadata\":{\"finalizers\":null}}"
  5. If you are not using OpenShift Pipelines for any purpose other than data science pipelines 1.0, you can remove the OpenShift Pipelines Operator.

  6. Data science pipelines 1.0 used the kfp-tekton Python library. Data science pipelines 2.0 does not use kfp-tekton. You can uninstall kfp-tekton when there are no remaining data science pipelines 1.0 pipeline servers in use on your cluster.