Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于yolov5在mac设备上使用mps加速出现的各种问题 #13226

Open
1 of 2 tasks
xxxkkw opened this issue Jul 28, 2024 · 2 comments · May be fixed by #13483
Open
1 of 2 tasks

关于yolov5在mac设备上使用mps加速出现的各种问题 #13226

xxxkkw opened this issue Jul 28, 2024 · 2 comments · May be fixed by #13483
Labels
bug Something isn't working

Comments

@xxxkkw
Copy link

xxxkkw commented Jul 28, 2024

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Training, Detection

Bug

首先从训练模型讲起,我使用的设备是一台MacOS 14的设备,m1max芯片,如果使用

python train.py --device mps

也就是仅使用官方的训练数据,训练的过程中会出现

  Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
  0%|          | 0/8 [00:00<?, ?it/s]/Users/xiongkaiwen/yolov5/train.py:414: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(amp):

虽然这并不影响训练的过程,最终还是可以正常的输出训练好的模型。
在我使用这个训练好的模型对 yolov5/data/images 文件夹内的图片进行检测,检测结果是正常的,但当我使用这个训练好的模型进行检测的时候,最终输出的视频文件是乱的,体现在
截屏2024-07-28 21 04 31
这个是我在视频里面截取的一帧,并且整个视频中框都呈现类似的错误,而我如果换用cpu进行检测的时候,检测结果就正常了
截屏2024-07-28 21 02 45
并且,如果我使用yolov5文件夹内的yolov5s.pt进行检测的时候,这种问题又消失了,检测的结果就是正常的,这就让人非常不能理解,我认为yolov5s.pt文件的训练集与我使用该命令的数据集

python train.py --device mps

应该是一致的,为什么会出现这样的错误?

还有,我在使用我自己的数据集的过程中,我曾经在服务器上训练好了一个模型,并且在那边使用的是cuda,并且确定这个模型文件是可用的,在服务器上对视频进行识别是正常的,但在我的设备上,使用这个模型对视频文件进行检测,结果就类似于我给出的图片一致,如果使用mps加速,结果就出现错乱,但使用cpu,结果就是符合预期的,我想知道这到底是什么问题。

Environment

YOLOv5 🚀 v7.0-348-g6deb2d75 Python-3.11.9 torch-2.5.0.dev20240727 CPU
Apple M1 Max 32G
MacOS 14.5 (23F79)

Minimal Reproducible Example

我使用的是anaconda创建版本为Python 3.11.9的虚拟环境,并使用

git clone https://github.com/ultralytics/yolov5.git
pip install -r requirements.txt
cd yolov5

在此时,删除现有的pytorch,安装最新版的nightly版

pip uninstall torch
pip uninstall torchvision
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

此时我所有包的版本如下:

Package            Version
------------------ ------------------
certifi            2024.7.4
charset-normalizer 3.3.2
contourpy          1.2.1
cycler             0.12.1
filelock           3.15.4
fonttools          4.53.1
fsspec             2024.6.1
gitdb              4.0.11
GitPython          3.1.43
idna               3.7
Jinja2             3.1.4
kiwisolver         1.4.5
MarkupSafe         2.1.5
matplotlib         3.9.1
mpmath             1.3.0
networkx           3.3
numpy              1.26.4
opencv-python      4.10.0.84
packaging          24.1
pandas             2.2.2
pillow             10.4.0
pip                24.1.2
psutil             6.0.0
py-cpuinfo         9.0.0
pyparsing          3.1.2
python-dateutil    2.9.0.post0
pytz               2024.1
PyYAML             6.0.2rc1
requests           2.32.3
scipy              1.14.0
seaborn            0.13.2
setuptools         71.1.0
six                1.16.0
smmap              5.0.1
sympy              1.13.1
thop               0.1.1-2209072238
torch              2.5.0.dev20240727
torchaudio         2.4.0
torchvision        0.20.0.dev20240727
tqdm               4.66.4
typing_extensions  4.12.2
tzdata             2024.1
ultralytics        8.2.66
ultralytics-thop   2.0.0
urllib3            2.2.2
wheel              0.43.0

按照这个步骤应该就能跟我的环境一致,然后就能按照我的步骤还原问题

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@xxxkkw xxxkkw added the bug Something isn't working label Jul 28, 2024
Copy link
Contributor

github-actions bot commented Jul 28, 2024

👋 Hello @xxxkkw, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

@glenn-jocher
Copy link
Member

@xxxkkw hello,

Thank you for providing a detailed description of the issue you're encountering with YOLOv5 on your Mac device using MPS acceleration. It seems like you've done a thorough job of setting up your environment and troubleshooting the problem. Let's address the issues step-by-step:

Deprecation Warning

The warning message you encountered:

FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.

This is a known issue with the latest versions of PyTorch. While it doesn't affect the training process, it indicates that the codebase needs to be updated to align with the latest PyTorch API changes. You can safely ignore this warning for now, but it's good to keep your packages updated to avoid such warnings in the future.

Video Detection Issues with MPS

The issue with the detection results being incorrect when using MPS but correct when using CPU suggests a potential problem with the MPS backend in PyTorch. Here are a few steps you can take to troubleshoot and potentially resolve this issue:

  1. Verify with Latest Versions: Ensure you are using the latest versions of YOLOv5 and PyTorch. Sometimes, bugs are fixed in newer releases.

    git pull  # update YOLOv5 repo
    pip install --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
  2. Test with Stable PyTorch: While nightly builds are great for accessing the latest features, they can sometimes introduce instability. Try using the latest stable release of PyTorch.

    pip uninstall torch torchvision torchaudio
    pip install torch torchvision torchaudio
  3. Check for Known Issues: Look through the PyTorch GitHub issues for any known problems with MPS on macOS. If you find a related issue, you can track its progress or contribute additional information.

  4. Fallback to CPU: If MPS continues to cause issues, you might need to fallback to using the CPU for inference on your Mac. While not ideal, it ensures that your results are accurate.

Using Pretrained Models

The fact that using yolov5s.pt works correctly while your custom-trained model does not suggests there might be differences in the training process or data. Ensure that your training data and process are consistent and that there are no discrepancies.

Example Code for CPU Inference

Here's an example of how you can run inference using the CPU to avoid MPS-related issues:

python detect.py --weights your_custom_model.pt --source your_video.mp4 --device cpu

Reporting Bugs

If the issue persists, consider reporting it to the PyTorch team with detailed information about your setup and the problems you're encountering. This helps improve the MPS backend for everyone.

Thank you for your patience and for being a part of the YOLO community. If you have any further questions or need additional assistance, feel free to ask!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants