Migrating to data science pipelines 2.0

Important

Data science pipelines 2.0 contains an installation of Argo Workflows. {productname-short} does not support direct customer usage of this installation of Argo Workflows.

If there is an existing installation of Argo Workflows that is not installed by data science pipelines on your cluster, data science pipelines will be disabled after you install or upgrade {productname-short}. To enable data science pipelines, remove the separate installation of Argo Workflows from your cluster. Data science pipelines will be enabled automatically.

Argo Workflows resources that are created by {productname-short} have the following labels in the OpenShift Console under Administration > CustomResourceDefinitions, in the argoproj.io group:

 labels:
    app.kubernetes.io/part-of: data-science-pipelines-operator
    app.opendatahub.io/data-science-pipelines-operator: 'true'

Upgrading to data science pipelines 2.0

Note

If you are using GitOps to manage your data science pipelines 1.0 pipeline runs, pause any sync operations related to data science pipelines including PipelineRuns or DataSciencePipelinesApplications (DSPAs) management. After migrating to data science pipelines 2.0, your PipelineRuns will be managed independently of data science pipelines, similar to any other Tekton resources.

Back up your pipelines data.

Create a new data science project.

Configure a new pipeline server.

Important

If you use an external database, you must use a different external database than the one you use for data science pipelines 1.0, as the database is migrated to data science pipelines 2.0 format.

Update and recompile your data science pipelines 1.0 pipelines as described in Migrate to Kubeflow Pipelines v2.

Note	Data science pipelines 2.0 does not use the `kfp-tekton` library. In most cases, you can replace usage of `kfp-tekton` with the `kfp` library. For data science pipelines 2.0, use the latest version of the KFP SDK. For more information, see the Kubeflow Pipelines SDK API Reference.

Tip	You can view historical data science pipelines 1.0 pipeline run information on your primary cluster in the {openshift-platform} Console Developer perspective under Pipelines → Project → PipelineRuns.

Import your updated pipelines to the new data science project.
Test and verify your new pipelines.

On your primary cluster, do the following tasks:

Remove your data science pipelines 1.0 pipeline servers.
Optional: Remove your data science pipelines 1.0 resources. For more information, see Removing data science pipelines 1.0 resources.

Recreate the pipeline servers for each data science project where the data science pipelines 1.0 pipeline servers existed.

Note	If you are using GitOps to manage your DSPAs, do the following tasks in your DSPAs before performing sync operations: Set `spec.dspVersion` to `v2`. Verify that the `apiVersion` is using `v1` instead of `v1alpha1`.

Import your updated data science pipelines to the applicable pipeline servers.

Tip

You can perform a batch upload by creating a script that uses the KFP SDK Client and the .upload_pipeline and .get_pipeline methods.

For any workbenches that communicate with data science pipelines 1.0, do the following tasks in the upgraded instance of {productname-long}:
1. Delete the existing workbench. For more information, see Deleting a workbench from a data science project.
2. If you want to use the notebook image version 2024.2, upgrade to Python 3.11 before creating a new workbench.
3. Create a new workbench that uses the existing persistent storage of the deleted workbench. For more information, see Creating a workbench.
4. Run the pipeline so that the data science pipelines 2.0 pipeline server schedules it.

Removing data science pipelines 1.0 resources

When your migration to data science pipelines 2.0 is complete on the intermediate cluster, you can clean up the data science pipelines 1.0 resources in your cluster.

Important

Before removing data science pipelines 1.0 resources, ensure that migration of your data science pipelines 1.0 pipelines to 2.0 is complete.

Identify the DataSciencePipelinesApplication (DSPA) resource that corresponds to the data science pipelines 1.0 pipeline server:
```
oc get dspa -n <YOUR_DS_PROJECT>
```

Delete the cluster role binding associated with this DSPA:

oc delete clusterrolebinding
ds-pipeline-ui-auth-delegator-<YOUR_DS_PROJECT>-dspa

Delete the DSPA:

oc delete dspa dspa -n <YOUR_DS_PROJECT>

If necessary, delete the DataSciencePipelinesApplication finalizer to complete the removal of the resource:
```
oc patch dspa dspa -n <YOUR_DS_PROJECT> --type=merge
-p "{\"metadata\":{\"finalizers\":null}}"
```
If you are not using OpenShift Pipelines for any purpose other than data science pipelines 1.0, you can remove the OpenShift Pipelines Operator.
Data science pipelines 1.0 used the kfp-tekton Python library. Data science pipelines 2.0 does not use kfp-tekton. You can uninstall kfp-tekton when there are no remaining data science pipelines 1.0 pipeline servers in use on your cluster.

Additional resources

PyPI: kfp
Kubeflow Pipelines SDK API Reference.
Creating a data science project
Configuring a pipeline server
Importing a data science pipeline
Deleting a pipeline server

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enabling-data-science-pipelines-2.adoc

enabling-data-science-pipelines-2.adoc

Migrating to data science pipelines 2.0

Upgrading to data science pipelines 2.0

Removing data science pipelines 1.0 resources

Files

enabling-data-science-pipelines-2.adoc

Latest commit

History

enabling-data-science-pipelines-2.adoc

File metadata and controls

Migrating to data science pipelines 2.0

Upgrading to data science pipelines 2.0

Removing data science pipelines 1.0 resources