Skip to content

Conversation

@Siva-Sainath
Copy link
Contributor

@Siva-Sainath Siva-Sainath commented Jan 5, 2026

Addresses the issue #12501,

Implements secure HuggingFace model artifact import via the standard DSL importer decorator pattern (artifact_uri='huggingface://repo-id'), consistent with S3/GCS/OCI implementations.

Extracts HuggingFace authentication token from Kubernetes Secret metadata (through HuggingFaceParams in pipeline spec), injects it at runtime as HF_TOKEN environment variable in pod context and downloads models using func huggingface_hub.snapshot_download() to location /huggingface//.

SDK Layer (artifact_types.py): Registers huggingface:// URI scheme for importer decorator

Backend Config (config.go): HuggingFaceParams struct with func StructuredHuggingFaceParams() for parsing secret metadata

Token Injection (object_store.go): getHuggingFaceToken() retrieves token from Secret and sets HF_TOKEN env variable

Import Workflow (importer_launcher.go): handleHuggingFaceImport() takes care of artifact download with injected credentials

Token remains in Kubernetes Secret and is injected at pod execution time—never embedded in pipeline code.

Checklist:

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mprahl for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow
Copy link

Hi @Siva-Sainath. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pathParts := strings.Split(parts, "/")

if len(pathParts) < 2 {
return fmt.Errorf("invalid HuggingFace URI format: %q, expected huggingface://repo_id/revision", artifactURI)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if I want to download a specific file, can I not specify huggingface://repo_id/file_name or if I want to download data set, then repo_type is dataset in that case

Can we add support for that?

Also, not sure if huggingface:// is a standard protocol prefix i.e not sure if its obvious to users that they need to specify uri with this prefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest commit, added support for specific datasets or files using huggingface://repo_id/file_name this
or a pattern huggingface://repo_id?allow_patterns=*.safetensors

To let users know that they have to use the URI huggingface://added a error message that details:
return fmt.Errorf("Invalid HuggingFace URI format: %q, expected huggingface://repo_id[/revision]\n"+
"Examples:\n"+
" huggingface://gpt2\n"+
" huggingface://meta-llama/Llama-2-7b\n"+
" huggingface://bert-base-uncased/config.json\n"+
" huggingface://wikitext?repo_type=dataset",
parts[0])

huggingface:// is NOT a standard protocol prefix it was written to match gs://, s3:// URI scheme that was present in the code

return nil
}

func (l *ImportLauncher) downloadHuggingFaceModel(ctx context.Context, repoID, revision, artifactURI string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if I have defined Workspace config with a PVC, will it download to that PVC or will it first upload to S3 and download to PVC from s3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No S3 intermediary, models download directly to the PVC when workspace is configured with it

return fmt.Errorf("failed to create directory %q: %w", localPath, err)
}

cmd := exec.CommandContext(ctx, "python3", "-c", fmt.Sprintf(`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is huggingface_hub sdk getting pip installed, are we expecting that to come as part of packages_to_install. If so, then in the future there may be version conflicts and method signature in those versions could be different, hence snapshot_download can fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The huggingface-hub library is installed in the launcher container image at build time, pinned to version 1.2.4, present in the Dockerfile.launcher file. Tried to ensure version compatibility through dynamic parameter detection using inspect.signature(), which adapts the code to work with any future version of huggingface_hub without requiring code changes

Copy link
Contributor

@nsingla nsingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some tests to verify that this works:

  1. Python Pipeline code
  2. Compiled version
  3. Argo Compiled Version
  4. Local Test Runner showing this works
  5. Add this pipeline to e2e tests

@Siva-Sainath Siva-Sainath force-pushed the feat/huggingface-artifact-import branch 3 times, most recently from 6eca216 to 02a9d2e Compare January 7, 2026 07:40
@Siva-Sainath
Copy link
Contributor Author

Siva-Sainath commented Jan 7, 2026

@nsingla will work on writing the tests. I would like to clarify, by adding this pipeline to e2e tests, do you mean adding the HF artifact import tests to the existing e2e test runner, or is there a specific test suite/location where I would integrate this? should I commit any of these tests to the PR or simply run and verify if they work or not?

- Rename artifactUri parameter to artifactURI (ST1003)
- Fix error string capitalization (ST1005)

Signed-off-by: Siva-Sainath <[email protected]>
Copilot AI review requested due to automatic review settings January 10, 2026 19:36
@Siva-Sainath Siva-Sainath force-pushed the feat/huggingface-artifact-import branch from 02a9d2e to dc7afd0 Compare January 10, 2026 19:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for importing HuggingFace Hub artifacts (models and datasets) into Kubeflow Pipelines using a new huggingface:// URI scheme. The implementation follows the existing pattern for cloud storage providers (S3, GCS, OCI) and includes secure token management through Kubernetes Secrets.

Changes:

  • SDK layer registers the huggingface:// URI scheme and implements local path mapping for HuggingFace artifacts
  • Backend adds HuggingFace authentication token retrieval from Kubernetes Secrets and injection at runtime
  • Import workflow implements HuggingFace-specific download logic using huggingface_hub Python library with support for repositories, specific files, and query parameters

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
sdk/python/kfp/dsl/types/artifact_types.py Adds huggingface:// URI scheme support and local path mapping between /huggingface/ and huggingface:// URIs
sdk/python/kfp/dsl/importer_node.py Documents the huggingface:// URI scheme with examples and query parameter support
backend/src/v2/objectstore/config.go Adds HuggingFaceParams struct and parsing function for HuggingFace configuration
backend/src/v2/objectstore/object_store.go Implements token retrieval from Kubernetes Secrets and environment variable injection
backend/src/v2/component/importer_launcher.go Implements HuggingFace artifact download logic with URI parsing, query parameter handling, and Python script execution
backend/Dockerfile.launcher Changes base image from alpine:3.19 to python:3.12-slim and installs huggingface-hub==1.2.4

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +82 to +83
if token != "" {
os.Setenv("HF_TOKEN", token)
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using os.Setenv to set HF_TOKEN in a global environment variable (line 83) creates a side effect that persists beyond this function call and could affect other goroutines or operations in the same process. This is a potential concurrency issue in multi-threaded environments.

Instead of using os.Setenv, consider returning the token and having the caller set it in the command's Env field directly (as is done in lines 476 and 513 of importer_launcher.go). This approach is more explicit and avoids global state mutation.

Suggested change
if token != "" {
os.Setenv("HF_TOKEN", token)
if token == "" {
glog.Warning("Hugging Face provider selected but received an empty token")
} else {
// Token successfully retrieved. The caller is responsible for propagating it
// (for example, by setting HF_TOKEN in a specific command's environment)
// instead of relying on global environment mutation here.
glog.V(1).Info("Hugging Face token retrieved; caller must propagate it explicitly")

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +33
FROM python:3.12-slim

RUN pip3 install --no-cache-dir huggingface-hub==1.2.4 && \
useradd -M -s /usr/sbin/nologin appuser
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base Docker image was changed from alpine:3.19 to python:3.12-slim, which significantly increases the image size. The alpine:3.19 base image is approximately 7 MB, while python:3.12-slim is around 125-150 MB. This substantial size increase affects deployment times, storage requirements, and network transfer costs.

Consider using a multi-stage build where the Python dependencies are installed in one stage, and then only the necessary files are copied to a minimal runtime image. Alternatively, install Python in an Alpine-based image to maintain a smaller footprint.

Suggested change
FROM python:3.12-slim
RUN pip3 install --no-cache-dir huggingface-hub==1.2.4 && \
useradd -M -s /usr/sbin/nologin appuser
FROM alpine:3.19
RUN apk add --no-cache python3 py3-pip && \
pip3 install --no-cache-dir huggingface-hub==1.2.4 && \
adduser -D -H -s /sbin/nologin appuser

Copilot uses AI. Check for mistakes.
Comment on lines +166 to +168
elif path.startswith(_HUGGINGFACE_LOCAL_MOUNT_PREFIX):
return (RemotePrefix.HUGGINGFACE.value +
path[len(_HUGGINGFACE_LOCAL_MOUNT_PREFIX):])
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new huggingface:// URI scheme support in convert_local_path_to_remote_path lacks test coverage. The existing test file (artifact_types_test.py) has parameterized tests for TestConvertLocalPathToRemotePath that cover gs://, minio://, s3://, and oci:// schemes, but the new huggingface:// scheme is not tested.

Add test cases for huggingface paths, such as:

  • '/huggingface/gpt2' -> 'huggingface://gpt2'
  • '/huggingface/meta-llama/Llama-2-7b' -> 'huggingface://meta-llama/Llama-2-7b'

Copilot uses AI. Check for mistakes.
Comment on lines +118 to +121
elif self.uri.startswith(RemotePrefix.HUGGINGFACE.value):
local_path = (
_HUGGINGFACE_LOCAL_MOUNT_PREFIX +
self.uri[len(RemotePrefix.HUGGINGFACE.value):])
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new huggingface:// URI scheme handling in the _get_path method lacks test coverage. While other URI schemes like gs://, s3://, minio://, and oci:// are covered by tests in artifact_types_test.py, there are no tests verifying that huggingface:// URIs are correctly converted to local paths.

Add test cases to verify the path generation for huggingface URIs, for example:

  • artifact.uri = 'huggingface://gpt2' should map to '/huggingface/gpt2'
  • artifact.uri = 'huggingface://meta-llama/Llama-2-7b/v1' should map to '/huggingface/meta-llama/Llama-2-7b/v1'

Copilot uses AI. Check for mistakes.
Comment on lines +77 to +85
case "huggingface":
token, err1 := getHuggingFaceToken(ctx, namespace, config.SessionInfo, k8sClient)
if err1 != nil {
return nil, err1
}
if token != "" {
os.Setenv("HF_TOKEN", token)
}
return nil, nil
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The huggingface case in OpenBucket returns (nil, nil) which is inconsistent with other providers that return a bucket object. While this may work for the current use case where HuggingFace downloads are handled specially in importer_launcher.go, returning nil for both the bucket and error could lead to nil pointer dereferences if the caller expects a valid bucket object.

Consider either returning a sentinel error indicating that HuggingFace doesn't use the bucket interface, or documenting this special behavior more clearly. The current implementation could be error-prone if future code tries to use this function for HuggingFace URIs without understanding this special case.

Copilot uses AI. Check for mistakes.
Comment on lines +279 to +285
// GetHuggingFaceTokenFromSessionInfo extracts HF token from SessionInfo
func GetHuggingFaceTokenFromSessionInfo(ctx context.Context, k8sClient kubernetes.Interface, namespace string, sessionInfo *SessionInfo) (string, error) {
if sessionInfo == nil {
return "", nil
}
return getHuggingFaceToken(ctx, namespace, sessionInfo, k8sClient)
}
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name GetHuggingFaceTokenFromSessionInfo is exported (starts with capital letter) and documented as "extracts HF token from SessionInfo", but it's actually a wrapper around the private getHuggingFaceToken function that requires both a Kubernetes client and namespace. The name doesn't clearly indicate that it accesses Kubernetes secrets.

Consider renaming to something more descriptive like GetHuggingFaceTokenFromK8sSecret or documenting the function more thoroughly to indicate it retrieves the token from a Kubernetes Secret based on SessionInfo configuration.

Copilot uses AI. Check for mistakes.
Comment on lines +371 to +386
if queryStr != "" {
params := strings.Split(queryStr, "&")
for _, param := range params {
kv := strings.Split(param, "=")
if len(kv) == 2 {
switch kv[0] {
case "repo_type":
repoType = kv[1]
case "allow_patterns":
allowPatterns = kv[1]
case "ignore_patterns":
ignorePatterns = kv[1]
}
}
}
}
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query parameter parsing uses a simple string split approach (lines 372-385) which doesn't handle URL encoding. If users provide query parameters with special characters (e.g., "allow_patterns=.bin%2C.safetensors" for ".bin,.safetensors"), the percent-encoded values won't be decoded properly.

Consider using a proper URL query parser like url.ParseQuery or document that query parameter values must not be URL-encoded.

Copilot uses AI. Check for mistakes.
FROM python:3.12-slim

RUN pip3 install --no-cache-dir huggingface-hub==1.2.4 && \
useradd -M -s /usr/sbin/nologin appuser
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The useradd command uses different flags between Alpine and Debian-based images. In Alpine (removed code), "-S" creates a system user, and "-s /usr/sbin/nologin" sets the shell. In Debian (new code line 33), "-M" prevents home directory creation, which is correct. However, the old Alpine version didn't prevent home directory creation explicitly with "-H".

The new command is correct for the python:3.12-slim (Debian-based) image. Ensure this change is intentional and properly tested.

Copilot uses AI. Check for mistakes.
Siva-Sainath and others added 2 commits January 11, 2026 14:54
…ms]"

Co-authored-by: Copilot <[email protected]>
Signed-off-by: Bandi Siva Sainath Reddy <[email protected]>
…supported by the installed version of huggingface-hub

Co-authored-by: Copilot <[email protected]>
Signed-off-by: Bandi Siva Sainath Reddy <[email protected]>
RUN GO111MODULE=on CGO_ENABLED=0 GOOS=linux go build -tags netgo -ldflags '-extldflags "-static"' -o /bin/launcher-v2 ./backend/src/v2/cmd/launcher-v2/*.go

FROM alpine:3.19
FROM python:3.12-slim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with copilot
Also 3.12 is not the officially supported python version even though it works. And we still need to support python 3.9 for backwards compatibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants