Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions proposals/new/proxy-cache-registry-head-calls.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Proposal: Cache proxy-cache registry HEAD calls

Author: Maxime Hubert / @mxm-tr

Discussion: [Harbor Community Meeting - April 16th](https://hackmd.io/CyQk5FdVQwWObMLVNqxW1w?both#April-16-2025)

[Original issue](https://github.com/goharbor/harbor/issues/21859)

## Abstract

Reduce the volume of HEAD requests by caching proxy cache ManifestExist calls.

## Background

When pulling many artifacts at the same time on a proxy-cache project, we can still trigger the rate limiting on the upstream registries and get 429 Too Many Requests errors.

This is in part caused by HEAD requests being sent for each artifact pull.

## Proposal

The solution could consist of a cache for calls to [HeadManifest](https://github.com/goharbor/harbor/blob/main/src/controller/proxy/controller.go#L258)

These cache entries can be valid for a fixed period of time, for a few seconds (10s)

## Non-Goals

N/A

## Rationale

The cache lifetime could be configurable via a parameter, but the current implementation has already some [hardcoded values](https://github.com/goharbor/harbor/blob/f8f1994c9ee97e41067870c4ed46b15eb21da3b6/src/controller/proxy/controller.go#L43), setting a fixed low value should be enough to not trigger rate-limiting on servers.

## Compatibility

N/A

## Implementation

1. Use a new cache key in the [proxy controller cache](https://github.com/goharbor/harbor/blob/bfc29904f96e17248a4e6204d12058c1d7d05ab8/src/controller/proxy/controller.go#L78), such as:

```
cache:manifestexists:<repo>:<ref>
```

2. Define its lifetime to a value that would prevent rate limiting from being triggered (10s?) in the [proxy-controller](https://github.com/goharbor/harbor/blob/bfc29904f96e17248a4e6204d12058c1d7d05ab8/src/controller/proxy/controller.go#L41-L48)

```golang
manifestExistsCacheInterval = 10 * time.Second
Copy link

@renmaosheng renmaosheng Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Maxime, by given such internal, there would be a a chance the client will pull different artifacts with the same binary from the source repo, if your pipeline is using this, you will have different testing results depending on when you run the tests. Such ambiguity will lead to difficulty when you trouble-shooting some issues. In my opinion, we should encourage the user to use sha digest instead of tag to reduce the API calling number.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the best practice is to use the sha digest, unfortunately some big open source projects (rancher for example) use hundreds of images from various origins and only allow the user to specify repository proxy urls, this proposal was made to ensure we don't send a HEAD request every time we pull an artifact.

The risk of pulling different artifacts using a tag exists indeed, maybe this behavior could be configurable?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Maxime, I don't know how to make it configurable, but we should highlight to the users about the possibility of pulling different binaries in different time, thanks.

```

3. Before running [remote.ManifestExist](https://github.com/goharbor/harbor/blob/main/src/controller/proxy/controller.go#L258), run a cache fetch on the proxy controller cache.

If the cache is invalid or the key is not found, run remote.ManifestExist, and save a boolean in the proxy controller cache.

## Open issues (if applicable)

https://github.com/goharbor/harbor/issues/21859