During the tranining dataloader problem #2064

asdemirel33 · 2024-07-31T15:17:30Z

We are using the YoloV7 object detection model and want to train it with a custom dataset. Our GPU hardware consists of Nvidia V100. Our dataset includes 60k training images, 2k validation images, and 4k test images. The image resolution is 640x640. We are using a batch size of 16 and 8 workers.

We are encountering a problem during training, which I believe is related to the dataloader. During training, GPU utilization sometimes drops to 0% and at other times increases to 50%-70%, resulting in very long iteration times. One epoch takes almost 50 minutes. I tried increasing the number of workers (16, 24), but it did not fix the issue.

Summary, dataloader cannot transfer data synchronously to GPU. So GPU consumption is decreasing.

Ian-Work-AI · 2024-08-21T02:53:50Z

@asdemirel33 I have the same question too. Did you solve the problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

During the tranining dataloader problem #2064

During the tranining dataloader problem #2064

asdemirel33 commented Jul 31, 2024

Ian-Work-AI commented Aug 21, 2024

During the tranining dataloader problem #2064

During the tranining dataloader problem #2064

Comments

asdemirel33 commented Jul 31, 2024

Ian-Work-AI commented Aug 21, 2024