CreateVolume times out before task can complete, starts another in an infinite loop #3024

braunsonm · 2024-09-03T20:24:30Z

/kind bug

What happened:
When creating a volume from a snapshot which is ~2TB in size, I am seeing timeouts from the CSI in which it "gives up" on the current task in vSphere and starts another to create the volume again.

This seems to happen after about 30-35 minutes from the PVC being created in a pending state. My task in vSphere does complete after 40 minutes but by then, a new task is created by the CSI and the loop starts over again until eventually almost all disk space is used in the datastore.

What you expected to happen:

The CSI should not create multiple tasks if the original task is still in progress. Or should have a configurable timeout.

How to reproduce it (as minimally and precisely as possible):

Create a PVC which will take >30 minutes to restore from a snapshot. In my case, 2TB
Create a PVC from that snapshot
Notice that while vSphere works on the task to create the container volume, the CnsVolumeOperationRequest will give up waiting and create a new task after about 30 minutes.

Anything else we need to know?:

Is there anyway to configure this timeout value? I'm not seeing a method in the code directly right now.

Environment:

csi-vsphere version: v3.1.2
vsphere-cloud-controller-manager version: 1.28.0
Kubernetes version: 1.28.10
vSphere version: 7.0.3.01700
OS (e.g. from /etc/os-release): Ubuntu 22.04

The text was updated successfully, but these errors were encountered:

braunsonm · 2024-09-03T21:05:16Z

Digging a bit in the logs I can see coming from the MonitorCreateVolumeTask function taskResult is empty for CreateVolume task: "task-xxxxxx", opID: "xxxxx"

This repeats a few times for the same task ID for 30 minutes and then is never output again as a new task is created in the CnsVolumeOperationRequest. As mentioned, the task does complete eventually but it takes a little bit longer than whatever timeout is happening here that makes the CSI give up waiting. This results in an infinite loop and orphaned FCDs in vSphere

k8s-triage-robot · 2024-12-02T21:23:16Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

braunsonm · 2024-12-02T21:24:32Z

/remove-lifecycle stale

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 3, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CreateVolume times out before task can complete, starts another in an infinite loop #3024

CreateVolume times out before task can complete, starts another in an infinite loop #3024

braunsonm commented Sep 3, 2024 •

edited

Loading

braunsonm commented Sep 3, 2024 •

edited

Loading

k8s-triage-robot commented Dec 2, 2024

braunsonm commented Dec 2, 2024

CreateVolume times out before task can complete, starts another in an infinite loop #3024

CreateVolume times out before task can complete, starts another in an infinite loop #3024

Comments

braunsonm commented Sep 3, 2024 • edited Loading

braunsonm commented Sep 3, 2024 • edited Loading

k8s-triage-robot commented Dec 2, 2024

braunsonm commented Dec 2, 2024

braunsonm commented Sep 3, 2024 •

edited

Loading

braunsonm commented Sep 3, 2024 •

edited

Loading