XPU code update #4048

eunwoosh · 2024-10-21T08:18:21Z

Summary

torch version update
- IPEX is integrated in torch since torch 2.4 (need to use exclusive index url when installing torch)
- torch 2.4 supports Max dGPU. torch 2.5 supports ARC also.
mixed precision
- According to documentation, both fp16 and bfp16 mixed precision are supported.
- But there is problems when using Gradient scaler w/ fp16.
- torch gradient scaler unscales tensors (fp32 -> fp64), but XPU doesn't support fp64 now.
- existing precision plugin for XPU in OTX isn't necessary. It can be removed.
OTX code update
- need to update cuda oriented code (e.g. torch.cuda.amp)
- There is no ipex.optimize now. Code for that in xpu strategy can be removed.

How to test

Checklist

I have added unit tests to cover my changes.
I have added integration tests to cover my changes.
I have ran e2e tests and there is no issues.
I have added the description of my changes into CHANGELOG in my target branch (e.g., CHANGELOG in develop).
I have updated the documentation in my target branch accordingly (e.g., documentation in develop).
I have linked related issues.

License

I submit my code changes under the same Apache License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below).

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

sovrasov · 2024-10-22T10:00:57Z

src/otx/engine/engine.py

@@ -1107,7 +1108,16 @@ def _build_trainer(self, **kwargs) -> None:
 self._cache.update(strategy="xpu_single")
 # add plugin for Automatic Mixed Precision on XPU
 if self._cache.args.get("precision", 32) == 16:
- self._cache.update(plugins=[MixedPrecisionXPUPlugin()])
+ msg = "XPU doesn't support fp16 now, so bfp16 will be used instead."


The feedback from IPEX is that BF16 is preferable for computer vision

I think we can remove warning then

kprokofi · 2024-10-23T07:50:45Z

tests/perf/test_classification.py

@@ -100,9 +100,6 @@ def test_perf(
 fxt_benchmark: Benchmark,
 fxt_accelerator: str,
 ):
- if fxt_model.name == "dino_v2" and fxt_accelerator == "xpu":


Are these models supported now?

github-actions bot added DEPENDENCY Any changes in any dependencies (new dep or its version) should be produced via Change Request on PM BUILD OTX 2.0 labels Oct 21, 2024

draft implementation

6c9e076

github-actions bot added the TEST Any changes in tests label Oct 22, 2024

eunwoosh force-pushed the update_xpu_code branch from 55f2b0f to 57e534b Compare October 22, 2024 07:13

sovrasov reviewed Oct 22, 2024

View reviewed changes

eunwoosh added 10 commits October 22, 2024 19:40

remove unnecessary file

5ad736c

remove unnecessary directory

4539fd1

update cuda oriented code

96636e5

fix typo

faed365

separate device util files

7eaeeea

add missing file

57e534b

fix bug

b5c81e2

use bf16 when mixed precision

c28c98f

add warn

0b8a419

remove code to disable xpu

3514911

kprokofi reviewed Oct 23, 2024

View reviewed changes

enable all models in the perf test

4016b9f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XPU code update #4048

XPU code update #4048

eunwoosh commented Oct 21, 2024 •

edited

Loading

sovrasov Oct 22, 2024

kprokofi Oct 23, 2024

kprokofi Oct 23, 2024

XPU code update #4048

Are you sure you want to change the base?

XPU code update #4048

Conversation

eunwoosh commented Oct 21, 2024 • edited Loading

Summary

How to test

Checklist

License

sovrasov Oct 22, 2024

Choose a reason for hiding this comment

kprokofi Oct 23, 2024

Choose a reason for hiding this comment

kprokofi Oct 23, 2024

Choose a reason for hiding this comment

eunwoosh commented Oct 21, 2024 •

edited

Loading