Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource requests should be set for ephemeral storage #95

Open
5 tasks
lindhe opened this issue Dec 5, 2023 · 1 comment
Open
5 tasks

Resource requests should be set for ephemeral storage #95

lindhe opened this issue Dec 5, 2023 · 1 comment

Comments

@lindhe
Copy link
Contributor

lindhe commented Dec 5, 2023

At least the web pod and worker beat pod are using emptyDir volumes. This will consume ephemeral-storage on the node that the pod gets scheduled on. Since we have not specified any resource requests and limits for ephemeral storage for the container, we risk that the pod gets evicted and/or crashes and/or causes resource exhaustion on the node.

Currently, my pods get evicted and I get a warning when the pod gets scheduled on a node with too little ephemeral storage available:

$ kubectl get events --field-selector involvedObject.name=worker-beat-7898d974fc-sb9xz                                                              130 ↵
LAST SEEN   TYPE      REASON                   OBJECT                             MESSAGE
46m         Warning   FailedScheduling         pod/worker-beat-7898d974fc-sb9xz   0/6 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 No preemption victims found for incoming pod..
46m         Warning   FailedScheduling         pod/worker-beat-7898d974fc-sb9xz   0/6 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 No preemption victims found for incoming pod..
45m         Normal    Scheduled                pod/worker-beat-7898d974fc-sb9xz   Successfully assigned invenio-dev/worker-beat-7898d974fc-sb9xz to kth-prod-1-worker-7a7516d2-v8vbc
45m         Normal    SuccessfulAttachVolume   pod/worker-beat-7898d974fc-sb9xz   AttachVolume.Attach succeeded for volume "pvc-801c874c-37a9-4520-a0e8-c59606c9d09a"
45m         Normal    Pulling                  pod/worker-beat-7898d974fc-sb9xz   Pulling image "ghcr.io/inveniosoftware/demo-inveniordm/demo-inveniordm@sha256:2193abc2caec9bc599061d6a5874fd2d7d201f55d1673a545af0a0406690e8a4"
44m         Warning   Evicted                  pod/worker-beat-7898d974fc-sb9xz   The node was low on resource: ephemeral-storage. Threshold quantity: 994154920, available: 759960Ki.
44m         Normal    Pulled                   pod/worker-beat-7898d974fc-sb9xz   Successfully pulled image "ghcr.io/inveniosoftware/demo-inveniordm/demo-inveniordm@sha256:2193abc2caec9bc599061d6a5874fd2d7d201f55d1673a545af0a0406690e8a4" in 1m2.20910036s (1m2.209116986s including waiting)
44m         Normal    Created                  pod/worker-beat-7898d974fc-sb9xz   Created container worker-beat
44m         Normal    Started                  pod/worker-beat-7898d974fc-sb9xz   Started container worker-beat
44m         Normal    Killing                  pod/worker-beat-7898d974fc-sb9xz   Stopping container worker-beat
44m         Warning   ExceededGracePeriod      pod/worker-beat-7898d974fc-sb9xz   Container runtime did not kill the pod within specified grace period.

I suggest we add resource limits and requests for ephemeral-storage on all containers that use emptyDir. I can whip up a PR for it, but I need your help to identify a reasonable size to set as request and limit.

@lindhe
Copy link
Contributor Author

lindhe commented Dec 5, 2023

Here's another example of what it can look like when pods are evicted because they use more resources than are available:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant