|
| 1 | +# Kubeflow Pipeline ModelKit Components |
| 2 | + |
| 3 | +Kubeflow Pipeline components for packaging and deploying ML artifacts as KitOps ModelKits. |
| 4 | + |
| 5 | +## Components |
| 6 | + |
| 7 | +### push-modelkit |
| 8 | + |
| 9 | +Packages ML artifacts in a directory as a ModelKit and pushes it to an OCI registry. |
| 10 | + |
| 11 | +If a `Kitfile` exists in `modelkit_dir`, it is used as-is. Otherwise, one is auto-generated via `kit init`. |
| 12 | + |
| 13 | +**Required inputs** |
| 14 | + |
| 15 | +- `registry` – Container registry host (e.g., `registry.io`) |
| 16 | +- `repository` – Repository path (e.g., `myorg/mymodel`) |
| 17 | +- `tag` – ModelKit tag (default: `latest`) |
| 18 | +- `modelkit_dir` – Directory with model files (with or without `Kitfile`) |
| 19 | + |
| 20 | +**Optional metadata (for Kitfile)** |
| 21 | + |
| 22 | +- `modelkit_name` – ModelKit package name |
| 23 | +- `modelkit_desc` – ModelKit description |
| 24 | +- `modelkit_author` – ModelKit author |
| 25 | + |
| 26 | +**Optional attestation metadata** |
| 27 | + |
| 28 | +- `dataset_uri` – Dataset URI |
| 29 | +- `code_repo` – Code repository URL |
| 30 | +- `code_commit` – Code commit hash |
| 31 | + |
| 32 | +**Outputs** |
| 33 | + |
| 34 | +- `uri` – Full ModelKit URI with digest (e.g., `registry.io/myorg/mymodel@sha256:abc…`) |
| 35 | +- `digest` – ModelKit digest (e.g., `sha256:abc…`) |
| 36 | + |
| 37 | +### unpack-modelkit |
| 38 | + |
| 39 | +Pulls a ModelKit from a registry and extracts it. |
| 40 | + |
| 41 | +**Inputs** |
| 42 | + |
| 43 | +- `modelkit_uri` – ModelKit reference (e.g., `registry.io/repo:tag` or `registry.io/repo@sha256:…`) |
| 44 | +- `extract_path` – Directory to extract contents (default: `/tmp/model`) |
| 45 | + |
| 46 | +**Outputs** |
| 47 | + |
| 48 | +- `model_path` – Directory where contents were extracted |
| 49 | + |
| 50 | +## Usage Examples |
| 51 | + |
| 52 | +Complete, runnable examples (including a full house-prices pipeline) are in the [`examples/`](examples/) directory. |
| 53 | + |
| 54 | +### Basic usage |
| 55 | + |
| 56 | +Training component that writes ML artifacts to a directory: |
| 57 | + |
| 58 | +```python |
| 59 | +from kfp import dsl |
| 60 | + |
| 61 | +@dsl.component( |
| 62 | + packages_to_install=['pandas', 'xgboost', 'scikit-learn'], |
| 63 | + base_image='python:3.11-slim', |
| 64 | +) |
| 65 | +def train_model(modelkit_dir: dsl.Output[dsl.Artifact]): |
| 66 | + """Train model and save to directory.""" |
| 67 | + import os |
| 68 | + import pickle |
| 69 | + |
| 70 | + model = train_your_model() |
| 71 | + os.makedirs(modelkit_dir.path, exist_ok=True) |
| 72 | + |
| 73 | + with open(os.path.join(modelkit_dir.path, 'model.pkl'), 'wb') as f: |
| 74 | + pickle.dump(model, f) |
| 75 | + |
| 76 | + save_dataset(os.path.join(modelkit_dir.path, 'predictions.csv')) |
| 77 | + save_code(os.path.join(modelkit_dir.path, 'train.py')) |
| 78 | + save_docs(os.path.join(modelkit_dir.path, 'README.md')) |
| 79 | +``` |
| 80 | + |
| 81 | +Component to push the directory as a ModelKit: |
| 82 | + |
| 83 | +```python |
| 84 | +from kfp import dsl, kubernetes |
| 85 | + |
| 86 | +@dsl.container_component |
| 87 | +def push_modelkit( |
| 88 | + registry: str, |
| 89 | + repository: str, |
| 90 | + tag: str, |
| 91 | + input_modelkit_dir: dsl.Input[dsl.Artifact], |
| 92 | + output_uri: dsl.Output[dsl.Artifact], |
| 93 | + output_digest: dsl.Output[dsl.Artifact], |
| 94 | + modelkit_name: str = '', |
| 95 | + modelkit_desc: str = '', |
| 96 | + modelkit_author: str = '', |
| 97 | + dataset_uri: str = '', |
| 98 | + code_repo: str = '', |
| 99 | + code_commit: str = '', |
| 100 | +): |
| 101 | + return dsl.ContainerSpec( |
| 102 | + image='ghcr.io/kitops-ml/kubeflow:latest', |
| 103 | + command=['/bin/bash', '-c'], |
| 104 | + args=[ |
| 105 | + f'/scripts/push-modelkit.sh ' |
| 106 | + f'"{registry}" "{repository}" "{tag}" ' |
| 107 | + f'--modelkit-dir "{input_modelkit_dir.path}" ' |
| 108 | + f'--name "{modelkit_name}" ' |
| 109 | + f'--desc "{modelkit_desc}" ' |
| 110 | + f'--author "{modelkit_author}" ' |
| 111 | + f'--dataset-uri "{dataset_uri}" ' |
| 112 | + f'--code-repo "{code_repo}" ' |
| 113 | + f'--code-commit "{code_commit}" ' |
| 114 | + f'&& cp /tmp/outputs/uri "{output_uri.path}" ' |
| 115 | + f'&& cp /tmp/outputs/digest "{output_digest.path}"' |
| 116 | + ], |
| 117 | + ) |
| 118 | +``` |
| 119 | + |
| 120 | +Simple end‑to‑end pipeline: |
| 121 | + |
| 122 | +```python |
| 123 | +@dsl.pipeline( |
| 124 | + name='simple-modelkit-pipeline', |
| 125 | + description='Train and package as ModelKit', |
| 126 | +) |
| 127 | +def simple_pipeline( |
| 128 | + registry: str = 'jozu.ml', |
| 129 | + repository: str = 'team/model', |
| 130 | + tag: str = 'latest', |
| 131 | +): |
| 132 | + train = train_model() |
| 133 | + |
| 134 | + push = push_modelkit( |
| 135 | + registry=registry, |
| 136 | + repository=repository, |
| 137 | + tag=tag, |
| 138 | + input_modelkit_dir=train.outputs['modelkit_dir'], |
| 139 | + modelkit_name='My Model', |
| 140 | + modelkit_desc='Description of my model', |
| 141 | + modelkit_author='Data Science Team', |
| 142 | + ) |
| 143 | + |
| 144 | + kubernetes.use_secret_as_volume( |
| 145 | + push, |
| 146 | + secret_name='docker-config', |
| 147 | + mount_path='/etc/docker-config', |
| 148 | + ) |
| 149 | +``` |
| 150 | + |
| 151 | +### Using a custom Kitfile |
| 152 | + |
| 153 | +If you need full control, create a `Kitfile` alongside your artifacts: |
| 154 | + |
| 155 | +```python |
| 156 | +@dsl.component(base_image='python:3.11-slim') |
| 157 | +def train_with_kitfile(modelkit_dir: dsl.Output[dsl.Artifact]): |
| 158 | + """Train and create custom Kitfile.""" |
| 159 | + import os |
| 160 | + |
| 161 | + train_and_save_model(modelkit_dir.path) |
| 162 | + |
| 163 | + kitfile_content = """ |
| 164 | +manifestVersion: 1.0 |
| 165 | +package: |
| 166 | + name: Custom Model |
| 167 | + description: Model with custom configuration |
| 168 | + authors: |
| 169 | + - Data Science Team |
| 170 | +model: |
| 171 | + path: model.pkl |
| 172 | +datasets: |
| 173 | + - path: train.csv |
| 174 | + - path: test.csv |
| 175 | +code: |
| 176 | + - path: train.py |
| 177 | +docs: |
| 178 | + - path: README.md |
| 179 | +""" |
| 180 | + with open(os.path.join(modelkit_dir.path, 'Kitfile'), 'w') as f: |
| 181 | + f.write(kitfile_content) |
| 182 | +``` |
| 183 | + |
| 184 | +When a `Kitfile` is present, the component uses it instead of generating one. |
| 185 | + |
| 186 | +### Pipeline with attestation |
| 187 | + |
| 188 | +```python |
| 189 | +@dsl.pipeline( |
| 190 | + name='production-pipeline', |
| 191 | + description='Production pipeline with attestation', |
| 192 | +) |
| 193 | +def production_pipeline( |
| 194 | + registry: str = 'jozu.ml', |
| 195 | + repository: str = 'team/prod-model', |
| 196 | + tag: str = 'v1.0.0', |
| 197 | + dataset_uri: str = 's3://bucket/data.csv', |
| 198 | + code_repo: str = 'github.com/org/repo', |
| 199 | + code_commit: str = 'abc123', |
| 200 | +): |
| 201 | + train = train_model() |
| 202 | + |
| 203 | + push = push_modelkit( |
| 204 | + registry=registry, |
| 205 | + repository=repository, |
| 206 | + tag=tag, |
| 207 | + input_modelkit_dir=train.outputs['modelkit_dir'], |
| 208 | + modelkit_name='Production Model', |
| 209 | + modelkit_desc='Production model v1.0.0', |
| 210 | + modelkit_author='ML Team', |
| 211 | + dataset_uri=dataset_uri, |
| 212 | + code_repo=code_repo, |
| 213 | + code_commit=code_commit, |
| 214 | + ) |
| 215 | + |
| 216 | + kubernetes.use_secret_as_volume( |
| 217 | + push, |
| 218 | + secret_name='docker-config', |
| 219 | + mount_path='/etc/docker-config', |
| 220 | + ) |
| 221 | + kubernetes.use_secret_as_volume( |
| 222 | + push, |
| 223 | + secret_name='cosign-keys', |
| 224 | + mount_path='/etc/cosign', |
| 225 | + ) |
| 226 | +``` |
| 227 | + |
| 228 | +## Secret Requirements |
| 229 | + |
| 230 | +### Registry credentials |
| 231 | + |
| 232 | +Create a Kubernetes secret with Docker registry credentials: |
| 233 | + |
| 234 | +```bash |
| 235 | +kubectl create secret generic docker-config \ |
| 236 | + --from-file=config.json="$HOME/.docker/config.json" \ |
| 237 | + --namespace=kubeflow |
| 238 | +``` |
| 239 | + |
| 240 | +Or: |
| 241 | + |
| 242 | +```bash |
| 243 | +kubectl create secret docker-registry docker-config \ |
| 244 | + --docker-server=jozu.ml \ |
| 245 | + --docker-username=myuser \ |
| 246 | + --docker-password=mypassword \ |
| 247 | + |
| 248 | + --namespace=kubeflow |
| 249 | +``` |
| 250 | + |
| 251 | +Mount in your pipeline (as shown above) using: |
| 252 | + |
| 253 | +```python |
| 254 | +kubernetes.use_secret_as_volume( |
| 255 | + push, |
| 256 | + secret_name='docker-config', |
| 257 | + mount_path='/etc/docker-config', |
| 258 | +) |
| 259 | +``` |
| 260 | + |
| 261 | +### Cosign keys (optional) |
| 262 | + |
| 263 | +For ModelKit attestation signing, create a secret with cosign keys: |
| 264 | + |
| 265 | +```bash |
| 266 | +cosign generate-key-pair |
| 267 | + |
| 268 | +kubectl create secret generic cosign-keys \ |
| 269 | + --from-file=cosign.key=cosign.key \ |
| 270 | + --from-file=cosign.pub=cosign.pub \ |
| 271 | + --namespace=kubeflow |
| 272 | +``` |
| 273 | + |
| 274 | +Mount it as in the attestation pipeline example: |
| 275 | + |
| 276 | +```python |
| 277 | +kubernetes.use_secret_as_volume( |
| 278 | + push, |
| 279 | + secret_name='cosign-keys', |
| 280 | + mount_path='/etc/cosign', |
| 281 | +) |
| 282 | +``` |
| 283 | + |
| 284 | +If cosign keys are not available, the signing step logs a warning and continues. |
| 285 | + |
| 286 | +## Troubleshooting |
| 287 | + |
| 288 | +### Authentication errors |
| 289 | + |
| 290 | +**Symptom:** `Failed to push ModelKit` or `401 Unauthorized` |
| 291 | + |
| 292 | +**Check:** |
| 293 | + |
| 294 | +```bash |
| 295 | +kubectl get secret docker-config -n kubeflow |
| 296 | +kubectl get secret docker-config -n kubeflow \ |
| 297 | + -o jsonpath='{.data.config\.json}' | base64 -d |
| 298 | +``` |
| 299 | + |
| 300 | +`config.json` should contain registry auth for your host: |
| 301 | + |
| 302 | +```json |
| 303 | +{ |
| 304 | + "auths": { |
| 305 | + "jozu.ml": { |
| 306 | + "auth": "base64(username:password)" |
| 307 | + } |
| 308 | + } |
| 309 | +} |
| 310 | +``` |
| 311 | + |
| 312 | +### Directory not found |
| 313 | + |
| 314 | +**Symptom:** `ModelKit directory does not exist` |
| 315 | + |
| 316 | +Ensure your training component creates `modelkit_dir.path` and writes artifacts into it (see `train_model` example above). |
0 commit comments