ENH: optimize MPS on Mac for Qwen2.5-VL #3524

SolardiaX · 2025-05-26T14:47:34Z

No description provided.

qinxuye · 2025-05-27T09:21:00Z

xinference/model/llm/transformers/qwen2_vl.py

@@ -119,6 +119,16 @@ def load(self):
                torch_dtype="float16",
                **kwargs,
            ).eval()
+        elif device == "mps":
+            # MacOS special, https://github.com/QwenLM/Qwen2.5-VL/issues/777


The issue described in issue seems has been fixed, I checked the file: https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct/blob/main/preprocessor_config.json

It's correct already.

qinxuye · 2025-05-27T09:25:33Z

What's the problem this PR tries to address? The issue in the comment seems incorrect, it's not related to MPS at all, maybe you link the wrong issue?

SolardiaX · 2025-06-04T15:08:00Z

Run Qwen2.5-vl-7b on MacOS, may get the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/xinference/core/utils.py", line 93, in wrapped
    ret = await func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xinference/core/worker.py", line 1127, in launch_builtin_model
    await model_ref.load()
  File "/usr/local/lib/python3.11/site-packages/xoscar/backends/context.py", line 262, in send
    return self._process_result_message(result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xoscar/backends/context.py", line 111, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.11/site-packages/xoscar/backends/pool.py", line 689, in send
    result = await self._run_coro(message.message_id, coro)
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xoscar/backends/pool.py", line 389, in _run_coro
    return await coro
  File "/usr/local/lib/python3.11/site-packages/xoscar/api.py", line 418, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 564, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__
    result = await result
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xinference/core/model.py", line 477, in load
    await asyncio.to_thread(self._model.load)
    ^^^^^^^^^^^^^^^^^
  File "/Users/test/.local/share/uv/cpython-3.11.11-macos-aarch64-none/lib/python3.11/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
      ^^^^^^^^^^^^^^^^^
  File "/Users/test/.local/share/uv/python/cpython-3.11.11-macos-aarch64-none/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xinference/model/llm/transformers/multimodal/core.py", line 61, in load
    self.load_multimodal_model()
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xinference/model/llm/transformers/multimodal/qwen2_vl.py", line 119, in load_multimodal_model
    self._model = model_cls.from_pretrained(
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 309, in _wrapper
    return func(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4573, in from_pretrained
    ) = cls._load_pretrained_model(
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4990, in _load_pretrained_model
    caching_allocator_warmup(model_to_load, expanded_device_map, hf_quantizer)
    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 6063, in caching_allocator_warmup
    _ = torch.empty(byte_count // factor, dtype=torch.float16, device=device, requires_grad=False)
    ^^^^^^^^^^^^^^^^^
RuntimeError: [address=127.0.0.1:49477, pid=3316] Invalid buffer size: 30.89 GB

To fix it, we need use options attn_implementation="eager" and low_cpu_mem_usage=True in model_cls.from_pretrained to low the memory usage.

This issue is also be mentioned at QwenLM/Qwen2.5-VL#760, and may effect to all qwen2.5-vl models.

qinxuye · 2025-07-21T12:40:14Z

Sorry, I missed the comment you sent, now this model has been modified and put into transformers/multimodel/qwen2_vl.py, so there is conflict now, can you fix accordingly?

fix: QwenLM/Qwen2.5-VL#761

c4531b1

XprobeBot added the bug Something isn't working label May 26, 2025

XprobeBot added this to the v1.x milestone May 26, 2025

qinxuye changed the title ~~fix: QwenLM/Qwen2.5-VL#761~~ BUG: fix mps for QwenLM/Qwen2.5-VL May 27, 2025

qinxuye changed the title ~~BUG: fix mps for QwenLM/Qwen2.5-VL~~ BUG: fix mps for Qwen2.5-VL May 27, 2025

qinxuye reviewed May 27, 2025

View reviewed changes

qinxuye changed the title ~~BUG: fix mps for Qwen2.5-VL~~ ENH: optimize mps for Qwen2.5-VL Jul 21, 2025

XprobeBot added enhancement New feature or request and removed bug Something isn't working labels Jul 21, 2025

qinxuye changed the title ~~ENH: optimize mps for Qwen2.5-VL~~ ENH: optimize MPS on Mac for Qwen2.5-VL Jul 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: optimize MPS on Mac for Qwen2.5-VL #3524

ENH: optimize MPS on Mac for Qwen2.5-VL #3524

Uh oh!

SolardiaX commented May 26, 2025

Uh oh!

qinxuye May 27, 2025

Uh oh!

qinxuye commented May 27, 2025 •

edited

Loading

Uh oh!

SolardiaX commented Jun 4, 2025

Uh oh!

qinxuye commented Jul 21, 2025

Uh oh!

Uh oh!

ENH: optimize MPS on Mac for Qwen2.5-VL #3524

Are you sure you want to change the base?

ENH: optimize MPS on Mac for Qwen2.5-VL #3524

Uh oh!

Conversation

SolardiaX commented May 26, 2025

Uh oh!

qinxuye May 27, 2025

Choose a reason for hiding this comment

Uh oh!

qinxuye commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SolardiaX commented Jun 4, 2025

Uh oh!

qinxuye commented Jul 21, 2025

Uh oh!

Uh oh!

qinxuye commented May 27, 2025 •

edited

Loading