This document outlines the approach and steps taken to train a modified YOLOv9-c model for clothes detection and instance segmentation using the DeepFashion dataset. The modified model architecture includes an additional head for instance segmentation, enabling simultaneous prediction of bounding boxes, instance segmentation masks, category, and confidence scores.
- Utilized Google Colab GPU for accelerated training.
- Installed necessary libraries and dependencies, including PyTorch, torchvision, and other required packages.
- Downloaded the DeepFashion dataset from the provided link.
- Created a small sample of the dataset containing 578 images.
- Included classes with more than 50 images, namely "Long Sleeve Top," "Short Sleeve Top," and "Shorts."
- Divided the dataset into train, validation, and test splits in a 7:2:1 ratio.
- Modified the YOLOv9-c model architecture to include an additional head for instance segmentation.
- Initialized the model with pretrained weights from
yolov9-c.pt,
which were pretrained on the MS COCO dataset.
- Adapted the training script to accommodate the additional instance segmentation head.
- Implemented functionality to read hyperparameters and configurations from a single YAML file for flexibility.
- Initialized training with hyperparameters specified in the config YAML file.
- Utilized SGD optimization, cosine learning rate scheduling, and data augmentation techniques for efficient training.
- Monitored training progress by logging key metrics such as loss and mAP.
- Evaluated the trained model on validation and test datasets to assess performance in detection and instance segmentation.
- Calculated performance metrics such as precision, recall, and F1 score using standard evaluation methodologies.
-
Box Loss: This shows how accurately the model predicts the bounding boxes of objects. It's staying around the same level during training, which means the model is consistent in predicting where objects are. The current box loss is approximately 0.2914.
-
Segmentation Loss: This measures how well the model predicts the shapes of objects. It also stays pretty consistent, which means the model is doing okay in outlining objects. The current segmentation loss is approximately 0.07728.
-
Classification Loss: This tells us how well the model predicts what type of object it's looking at. It's staying steady, indicating the model is consistent in identifying objects. The current classification loss is approximately 0.2352.
-
Instance Detection Performance: The model is performing decently in spotting objects, but there's room to improve, especially in recognizing certain types of clothing like long-sleeve tops. The current precision for instance detection is approximately 0.856, and the recall is approximately 0.82.
-
Limited Training Data: Fine-tuning was performed using only 500 images, which might not be enough to fully capture the diversity of clothing items. This limited dataset could restrict the model's ability to generalize well to unseen data.
-
Model Complexity: The model might not be complex enough to understand all the details in the images. We might need a fancier model that can understand more intricate patterns.
-
Learning Parameters: We could tweak some settings in how the model learns from the data, like how quickly it learns or how much it changes each time it learns from new examples.
-
Time Spent Training: Maybe the model needs more time to learn. We could let it train for longer or give it more data to practice on.
-
Augment Training Data: Although limited, we can augment the training data with transformations like rotations, flips, and color adjustments to artificially increase the dataset's size and diversity.
-
Experiment with Model Architectures: We could try different model architectures or modify the existing one to make it more suitable for the task of clothes detection and instance segmentation.
-
Hyperparameter Tuning: Fine-tuning learning rates, batch sizes, and other hyperparameters could help optimize the model's performance given the limited training data.
-
Transfer Learning: Leveraging pre-trained models on larger datasets and fine-tuning them on our dataset might help the model learn better representations.
By addressing these aspects, we aim to improve the model's performance and make it more effective at recognizing clothes in pictures, even with the limited training data.
By following this approach, a modified YOLOv9-c model was successfully trained for clothes detection and instance segmentation on the DeepFashion dataset. The use of pretrained weights, GPU acceleration, and comprehensive training procedures facilitated the development of a robust and effective model. Further analysis and optimization can be performed to enhance performance and applicability in real-world scenarios.
For any inquiries or further information, please contact:
- Name: [Shreyas Dixit]
- Email: [[email protected]]