Skip to content

Commit 4e9112e

Browse files
committed
Add Kubeflow Pipeline components for ModelKits
Introduces push-modelkit and unpack-modelkit components to enable integration of KitOps with Kubeflow Pipelines.
1 parent 826066b commit 4e9112e

File tree

12 files changed

+1736
-0
lines changed

12 files changed

+1736
-0
lines changed
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Multi-platform digest for Cosign v2.4.0
2+
ARG COSIGN_DIGEST=sha256:9d50ceb15f023eda8f58032849eedc0216236d2e2f4cfe1cdf97c00ae7798cfe
3+
ARG KIT_BASE_IMAGE=ghcr.io/kitops-ml/kitops:next
4+
5+
FROM gcr.io/projectsigstore/cosign@$COSIGN_DIGEST AS cosign-install
6+
FROM $KIT_BASE_IMAGE
7+
8+
# Install additional tools needed for scripts
9+
USER 0
10+
RUN apk add --no-cache \
11+
bash \
12+
jq
13+
USER 1001
14+
15+
# Copy cosign from multi-platform build
16+
COPY --from=cosign-install /ko-app/cosign /usr/local/bin/cosign
17+
18+
# Copy scripts (needs root for chmod)
19+
USER 0
20+
COPY scripts/ /scripts/
21+
RUN chmod +x /scripts/*.sh
22+
USER 1001
23+
24+
# Set working directory
25+
WORKDIR /workspace
26+
27+
# Default entrypoint
28+
ENTRYPOINT ["/bin/bash"]
29+
30+
LABEL org.opencontainers.image.description="KitOps Kubeflow Pipeline Components"
31+
LABEL org.opencontainers.image.source="https://github.com/kitops-ml/kitops"
32+
LABEL org.opencontainers.image.licenses="Apache-2.0"
Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
# Kubeflow Pipeline ModelKit Components
2+
3+
Kubeflow Pipeline components for packaging and deploying ML artifacts as KitOps ModelKits.
4+
5+
## Components
6+
7+
### push-modelkit
8+
9+
Packages ML artifacts in a directory as a ModelKit and pushes it to an OCI registry.
10+
11+
If a `Kitfile` exists in `modelkit_dir`, it is used as-is. Otherwise, one is auto-generated via `kit init`.
12+
13+
**Required inputs**
14+
15+
- `registry` – Container registry host (e.g., `registry.io`)
16+
- `repository` – Repository path (e.g., `myorg/mymodel`)
17+
- `tag` – ModelKit tag (default: `latest`)
18+
- `modelkit_dir` – Directory with model files (with or without `Kitfile`)
19+
20+
**Optional metadata (for Kitfile)**
21+
22+
- `modelkit_name` – ModelKit package name
23+
- `modelkit_desc` – ModelKit description
24+
- `modelkit_author` – ModelKit author
25+
26+
**Optional attestation metadata**
27+
28+
- `dataset_uri` – Dataset URI
29+
- `code_repo` – Code repository URL
30+
- `code_commit` – Code commit hash
31+
32+
**Outputs**
33+
34+
- `uri` – Full ModelKit URI with digest (e.g., `registry.io/myorg/mymodel@sha256:abc…`)
35+
- `digest` – ModelKit digest (e.g., `sha256:abc…`)
36+
37+
### unpack-modelkit
38+
39+
Pulls a ModelKit from a registry and extracts it.
40+
41+
**Inputs**
42+
43+
- `modelkit_uri` – ModelKit reference (e.g., `registry.io/repo:tag` or `registry.io/repo@sha256:…`)
44+
- `extract_path` – Directory to extract contents (default: `/tmp/model`)
45+
46+
**Outputs**
47+
48+
- `model_path` – Directory where contents were extracted
49+
50+
## Usage Examples
51+
52+
Complete, runnable examples (including a full house-prices pipeline) are in the [`examples/`](examples/) directory.
53+
54+
### Basic usage
55+
56+
Training component that writes ML artifacts to a directory:
57+
58+
```python
59+
from kfp import dsl
60+
61+
@dsl.component(
62+
packages_to_install=['pandas', 'xgboost', 'scikit-learn'],
63+
base_image='python:3.11-slim',
64+
)
65+
def train_model(modelkit_dir: dsl.Output[dsl.Artifact]):
66+
"""Train model and save to directory."""
67+
import os
68+
import pickle
69+
70+
model = train_your_model()
71+
os.makedirs(modelkit_dir.path, exist_ok=True)
72+
73+
with open(os.path.join(modelkit_dir.path, 'model.pkl'), 'wb') as f:
74+
pickle.dump(model, f)
75+
76+
save_dataset(os.path.join(modelkit_dir.path, 'predictions.csv'))
77+
save_code(os.path.join(modelkit_dir.path, 'train.py'))
78+
save_docs(os.path.join(modelkit_dir.path, 'README.md'))
79+
```
80+
81+
Component to push the directory as a ModelKit:
82+
83+
```python
84+
from kfp import dsl, kubernetes
85+
86+
@dsl.container_component
87+
def push_modelkit(
88+
registry: str,
89+
repository: str,
90+
tag: str,
91+
input_modelkit_dir: dsl.Input[dsl.Artifact],
92+
output_uri: dsl.Output[dsl.Artifact],
93+
output_digest: dsl.Output[dsl.Artifact],
94+
modelkit_name: str = '',
95+
modelkit_desc: str = '',
96+
modelkit_author: str = '',
97+
dataset_uri: str = '',
98+
code_repo: str = '',
99+
code_commit: str = '',
100+
):
101+
return dsl.ContainerSpec(
102+
image='ghcr.io/kitops-ml/kubeflow:latest',
103+
command=['/bin/bash', '-c'],
104+
args=[
105+
f'/scripts/push-modelkit.sh '
106+
f'"{registry}" "{repository}" "{tag}" '
107+
f'--modelkit-dir "{input_modelkit_dir.path}" '
108+
f'--name "{modelkit_name}" '
109+
f'--desc "{modelkit_desc}" '
110+
f'--author "{modelkit_author}" '
111+
f'--dataset-uri "{dataset_uri}" '
112+
f'--code-repo "{code_repo}" '
113+
f'--code-commit "{code_commit}" '
114+
f'&& cp /tmp/outputs/uri "{output_uri.path}" '
115+
f'&& cp /tmp/outputs/digest "{output_digest.path}"'
116+
],
117+
)
118+
```
119+
120+
Simple end‑to‑end pipeline:
121+
122+
```python
123+
@dsl.pipeline(
124+
name='simple-modelkit-pipeline',
125+
description='Train and package as ModelKit',
126+
)
127+
def simple_pipeline(
128+
registry: str = 'jozu.ml',
129+
repository: str = 'team/model',
130+
tag: str = 'latest',
131+
):
132+
train = train_model()
133+
134+
push = push_modelkit(
135+
registry=registry,
136+
repository=repository,
137+
tag=tag,
138+
input_modelkit_dir=train.outputs['modelkit_dir'],
139+
modelkit_name='My Model',
140+
modelkit_desc='Description of my model',
141+
modelkit_author='Data Science Team',
142+
)
143+
144+
kubernetes.use_secret_as_volume(
145+
push,
146+
secret_name='docker-config',
147+
mount_path='/etc/docker-config',
148+
)
149+
```
150+
151+
### Using a custom Kitfile
152+
153+
If you need full control, create a `Kitfile` alongside your artifacts:
154+
155+
```python
156+
@dsl.component(base_image='python:3.11-slim')
157+
def train_with_kitfile(modelkit_dir: dsl.Output[dsl.Artifact]):
158+
"""Train and create custom Kitfile."""
159+
import os
160+
161+
train_and_save_model(modelkit_dir.path)
162+
163+
kitfile_content = """
164+
manifestVersion: 1.0
165+
package:
166+
name: Custom Model
167+
description: Model with custom configuration
168+
authors:
169+
- Data Science Team
170+
model:
171+
path: model.pkl
172+
datasets:
173+
- path: train.csv
174+
- path: test.csv
175+
code:
176+
- path: train.py
177+
docs:
178+
- path: README.md
179+
"""
180+
with open(os.path.join(modelkit_dir.path, 'Kitfile'), 'w') as f:
181+
f.write(kitfile_content)
182+
```
183+
184+
When a `Kitfile` is present, the component uses it instead of generating one.
185+
186+
### Pipeline with attestation
187+
188+
```python
189+
@dsl.pipeline(
190+
name='production-pipeline',
191+
description='Production pipeline with attestation',
192+
)
193+
def production_pipeline(
194+
registry: str = 'jozu.ml',
195+
repository: str = 'team/prod-model',
196+
tag: str = 'v1.0.0',
197+
dataset_uri: str = 's3://bucket/data.csv',
198+
code_repo: str = 'github.com/org/repo',
199+
code_commit: str = 'abc123',
200+
):
201+
train = train_model()
202+
203+
push = push_modelkit(
204+
registry=registry,
205+
repository=repository,
206+
tag=tag,
207+
input_modelkit_dir=train.outputs['modelkit_dir'],
208+
modelkit_name='Production Model',
209+
modelkit_desc='Production model v1.0.0',
210+
modelkit_author='ML Team',
211+
dataset_uri=dataset_uri,
212+
code_repo=code_repo,
213+
code_commit=code_commit,
214+
)
215+
216+
kubernetes.use_secret_as_volume(
217+
push,
218+
secret_name='docker-config',
219+
mount_path='/etc/docker-config',
220+
)
221+
kubernetes.use_secret_as_volume(
222+
push,
223+
secret_name='cosign-keys',
224+
mount_path='/etc/cosign',
225+
)
226+
```
227+
228+
## Secret Requirements
229+
230+
### Registry credentials
231+
232+
Create a Kubernetes secret with Docker registry credentials:
233+
234+
```bash
235+
kubectl create secret generic docker-config \
236+
--from-file=config.json="$HOME/.docker/config.json" \
237+
--namespace=kubeflow
238+
```
239+
240+
Or:
241+
242+
```bash
243+
kubectl create secret docker-registry docker-config \
244+
--docker-server=jozu.ml \
245+
--docker-username=myuser \
246+
--docker-password=mypassword \
247+
248+
--namespace=kubeflow
249+
```
250+
251+
Mount in your pipeline (as shown above) using:
252+
253+
```python
254+
kubernetes.use_secret_as_volume(
255+
push,
256+
secret_name='docker-config',
257+
mount_path='/etc/docker-config',
258+
)
259+
```
260+
261+
### Cosign keys (optional)
262+
263+
For ModelKit attestation signing, create a secret with cosign keys:
264+
265+
```bash
266+
cosign generate-key-pair
267+
268+
kubectl create secret generic cosign-keys \
269+
--from-file=cosign.key=cosign.key \
270+
--from-file=cosign.pub=cosign.pub \
271+
--namespace=kubeflow
272+
```
273+
274+
Mount it as in the attestation pipeline example:
275+
276+
```python
277+
kubernetes.use_secret_as_volume(
278+
push,
279+
secret_name='cosign-keys',
280+
mount_path='/etc/cosign',
281+
)
282+
```
283+
284+
If cosign keys are not available, the signing step logs a warning and continues.
285+
286+
## Troubleshooting
287+
288+
### Authentication errors
289+
290+
**Symptom:** `Failed to push ModelKit` or `401 Unauthorized`
291+
292+
**Check:**
293+
294+
```bash
295+
kubectl get secret docker-config -n kubeflow
296+
kubectl get secret docker-config -n kubeflow \
297+
-o jsonpath='{.data.config\.json}' | base64 -d
298+
```
299+
300+
`config.json` should contain registry auth for your host:
301+
302+
```json
303+
{
304+
"auths": {
305+
"jozu.ml": {
306+
"auth": "base64(username:password)"
307+
}
308+
}
309+
}
310+
```
311+
312+
### Directory not found
313+
314+
**Symptom:** `ModelKit directory does not exist`
315+
316+
Ensure your training component creates `modelkit_dir.path` and writes artifacts into it (see `train_model` example above).

0 commit comments

Comments
 (0)