Skip to content

Conversation

SolardiaX
Copy link

No description provided.

@XprobeBot XprobeBot added the bug Something isn't working label May 26, 2025
@XprobeBot XprobeBot added this to the v1.x milestone May 26, 2025
@qinxuye qinxuye changed the title fix: QwenLM/Qwen2.5-VL#761 BUG: fix mps for QwenLM/Qwen2.5-VL May 27, 2025
@qinxuye qinxuye changed the title BUG: fix mps for QwenLM/Qwen2.5-VL BUG: fix mps for Qwen2.5-VL May 27, 2025
@@ -119,6 +119,16 @@ def load(self):
torch_dtype="float16",
**kwargs,
).eval()
elif device == "mps":
# MacOS special, https://github.com/QwenLM/Qwen2.5-VL/issues/777
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue described in issue seems has been fixed, I checked the file: https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct/blob/main/preprocessor_config.json

It's correct already.

@qinxuye
Copy link
Contributor

qinxuye commented May 27, 2025

What's the problem this PR tries to address? The issue in the comment seems incorrect, it's not related to MPS at all, maybe you link the wrong issue?

@SolardiaX
Copy link
Author

Run Qwen2.5-vl-7b on MacOS, may get the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/xinference/core/utils.py", line 93, in wrapped
    ret = await func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xinference/core/worker.py", line 1127, in launch_builtin_model
    await model_ref.load()
  File "/usr/local/lib/python3.11/site-packages/xoscar/backends/context.py", line 262, in send
    return self._process_result_message(result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xoscar/backends/context.py", line 111, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.11/site-packages/xoscar/backends/pool.py", line 689, in send
    result = await self._run_coro(message.message_id, coro)
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xoscar/backends/pool.py", line 389, in _run_coro
    return await coro
  File "/usr/local/lib/python3.11/site-packages/xoscar/api.py", line 418, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 564, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__
    result = await result
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xinference/core/model.py", line 477, in load
    await asyncio.to_thread(self._model.load)
    ^^^^^^^^^^^^^^^^^
  File "/Users/test/.local/share/uv/cpython-3.11.11-macos-aarch64-none/lib/python3.11/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
      ^^^^^^^^^^^^^^^^^
  File "/Users/test/.local/share/uv/python/cpython-3.11.11-macos-aarch64-none/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xinference/model/llm/transformers/multimodal/core.py", line 61, in load
    self.load_multimodal_model()
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xinference/model/llm/transformers/multimodal/qwen2_vl.py", line 119, in load_multimodal_model
    self._model = model_cls.from_pretrained(
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 309, in _wrapper
    return func(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4573, in from_pretrained
    ) = cls._load_pretrained_model(
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4990, in _load_pretrained_model
    caching_allocator_warmup(model_to_load, expanded_device_map, hf_quantizer)
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 6063, in caching_allocator_warmup
    _ = torch.empty(byte_count // factor, dtype=torch.float16, device=device, requires_grad=False)
    ^^^^^^^^^^^^^^^^^
RuntimeError: [address=127.0.0.1:49477, pid=3316] Invalid buffer size: 30.89 GB

To fix it, we need use options attn_implementation="eager" and low_cpu_mem_usage=True in model_cls.from_pretrained to low the memory usage.

This issue is also be mentioned at QwenLM/Qwen2.5-VL#760, and may effect to all qwen2.5-vl models.

@qinxuye qinxuye changed the title BUG: fix mps for Qwen2.5-VL ENH: optimize mps for Qwen2.5-VL Jul 21, 2025
@XprobeBot XprobeBot added enhancement New feature or request and removed bug Something isn't working labels Jul 21, 2025
@qinxuye qinxuye changed the title ENH: optimize mps for Qwen2.5-VL ENH: optimize MPS on Mac for Qwen2.5-VL Jul 21, 2025
@qinxuye
Copy link
Contributor

qinxuye commented Jul 21, 2025

Sorry, I missed the comment you sent, now this model has been modified and put into transformers/multimodel/qwen2_vl.py, so there is conflict now, can you fix accordingly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants