The timeout of git fetch in repo server correlates with unbounded growth of ephemeral storage use, up to tens of Gi #18831
Labels
bug/in-triage
This issue needs further triage to be correctly classified
bug
Something isn't working
component:api
API bugs and enhancements
component:core
Syncing, diffing, cluster state cache
component:repo-server
component:server
type:bug
Checklist:
argocd version
.Describe the bug
At some point there were quite a lot of logs like
The disk usage of repo server was growing unbounded, and even with a large ephemeral storage request and limit the pods would get evicted rather quickly.
After increasing the exec timeout to 2m30s the timeouts were gone and ephemeral storage use was stable at ~2.3Gi instead of 50Gi+.
I'm pretty sure that's correlated since that's the only related change to repo server I was making at the moment.
Looks like partially loaded data is not getting cleaned up if there's a timeout.
To Reproduce
Get a large enough repo of multiple Gi with many updates to trigger exec timeouts on a repo server.
Expected behavior
The ephemeral storage usage is limited.
Screenshots
Version
Custom build from master + #18694 around the time of v2.12.0-rc1 release.
Logs
The text was updated successfully, but these errors were encountered: