Skip to content

Commit fe30e5c

Browse files
tianleiwuvraspartirupath-qtiqti-ashwshanliqunfu
authored
1.24.0 release cherry-pick round 1 (#27104)
### Description This PR cherry-picks the following changes for the 1.24.0 release. ### Cherry-picked Commits | Commit | Commit Title | Author | |---|---|---| | 744e7fe | Add type definitions, registration, utilities for INT2/UINT2 support (#26824) | vraspar | | 530a1fb | [QNN EP] Add BFloat16 dtype support in QNN EP (#26987) | tirupath-qti | | 8e050d1 | Implement new experimental lookup-based matrix multiplication method(TMAC) (#26695) | vraspar | | 2d2ba6b | [MLAS/CPU EP] Improve performance of Silu activation path within the QuickGelu CPU kernel (#26753) | Hariharan Seshadri | | 1c02b79 | [QNN EP] Add support for handling 0-dimension for Concat Op (#27000) | Ashwath Shankarnarayan | | cc2b01b | Fix ClipQuantFusion crash when Clip has multiple input edges (#27016) | Edward Chen | | bbd3850 | [QNN EP] Support quantized BatchNorm with per-channel DQ params on QNN HTP (#26959) | qti-yuduo | | d8f0318 | Add API to get ep graph partitioning info (#26781) | Adrian Lizarraga | | b912b18 | [OVEP] OpenVINO EP Features and bug-fixes for ORT-1.24 - Follow up (#27007) | Preetha Veeramalai | | ba11af4 | [QNN-EP] Add MatMulNBits translation for GPU (#26340) | quic-tirupath | | c03c419 | [MLAS/NEON] Add dedicated kernel for depthwise convolution for ARM64 using NEON intrinsics (#26688) | Hariharan Seshadri | | e7dfd69 | [QNN-EP] Support alternate Layernorm fusion pattern in QNN preprocess (#26060) | qti-mattsinc | | 4013dc1 | Implement multithreading in qgemm_kleidi (#26301) | Melike Kaptan | | 9f06181 | [CXX] Enable users to specify custom OrtSyncStream via RunOptions (#26988) | Dmitri Smirnov | | cfccd64 | Added support for QMX kernels in MLAS (#26849) | qti-vaiskv | | 29d9b2f | Tweak external resource importer handle structs (#27040) | Scott McKay | | 9d108d0 | [QNN EP] Add QuickGELU operator support for QNN provider (#27034) | tirupath-qti | | b35688f | Add INT2 and UINT2 support for QDQ, transpose and cast ops (#27022) | vraspar | | 6d34aba | Introducing BF16 Pointwise NCHWc Convolution for Arm64 (#26838) | Rohanjames1997 | | 36017ad | [EP ABI] Add CreateCustomOpDomains() API for plugin EP to register custom ops (#27050) | Chi Lo | | 50a03e4 | Add a new pipeline for CUDA 13 nuget builds (#27023) | eserscor | | a0d4439 | [EP ABI] Update Graph_GetGraphView() implementation (#26711) | Chi Lo | | 34bb209 | [webgpu] Fix a bug for im2col (#27069) | Wenqin Yang | | 46e8d45 | [QNN EP] Add FusedMatMul operator support (#27044) | tirupath-qti | | 5e7e7a3 | Disable Float32_2Bits_Asymmetric_256x256 test (#27046) | vraspar | | 39f966e | Fix Doxygen documentation build error in onnxruntime_c_api.h (#27083) | Nick Eubanks | | 8a7a797 | Print tensor for new packed type of 2 bits (#27064) | Tianlei Wu | | 01f40e6 | Fix GPU JAR testing on Linux (#27011) | eserscor | | b6ed7f3 | Fix warning around ununsed code in QNN Android Emulator builds by clang (#27026) | Hariharan Seshadri | | d7daa45 | Raise the timeout for the ios simulator job (#27045) | Hariharan Seshadri | | 7e1d818 | upgrade emsdk to 4.0.23 (#27029) | Yulong Wang | | 347b990 | Fix failing mainline build on Arm64 linux (#27101) | Rohanjames1997 | | f481b17 | Add dedicated API to support extracting compatibility string from model metadata (#27015) | adrastogi | --------- Signed-off-by: Liqun Fu <liqun.fu@microsoft.com> Signed-off-by: bfilipek <bartlomiej.filipek@intel.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Jonathan Clohessy <jonathan.clohessy@arm.com> Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com> Signed-off-by: melkap01 <melike.kaptan@arm.com> Co-authored-by: vraspar <vrajang@outlook.com> Co-authored-by: tirupath-qti <tirupath@qti.qualcomm.com> Co-authored-by: Ashwath Shankarnarayan <ashwshan@qti.qualcomm.com> Co-authored-by: Liqun Fu <liqun.fu@microsoft.com> Co-authored-by: carzh <wolfivyaura@gmail.com> Co-authored-by: Hector Li <hecli@microsoft.com> Co-authored-by: carzh <carolinezhu@microsoft.com> Co-authored-by: Vrajang Parikh <vrparikh@microsoft.com> Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Yuduo Wu <yuduow@qti.qualcomm.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com> Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com> Co-authored-by: saurabh <saurabh1.kale@intel.com> Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: Javier Martinez <javier.e.martinez@intel.com> Co-authored-by: Bartlomiej Filipek <bartlomiej.filipek@intel.com> Co-authored-by: bopeng1234 <bo.peng@intel.com> Co-authored-by: Eric Crawford <eric.r.crawford@intel.com> Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com> Co-authored-by: TejalKhade28 <tejal.khade@intel.com> Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com> Co-authored-by: Yaru Du <yaru.du@intel.com> Co-authored-by: Ryan Metcalfe <107415876+RyanMetcalfeInt8@users.noreply.github.com> Co-authored-by: Dvoretckii, Mikhail <mikhail.dvoretckii@intel.com> Co-authored-by: Pallavi Gupta <pallavi.gupta@intel.com> Co-authored-by: Jianhui Dai <jianhui.j.dai@intel.com> Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Fei Chen <feich@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Akupadhye <aupadhye@qti.qualcomm.com> Co-authored-by: Wang Ning <ning4.wang@intel.com> Co-authored-by: Maximilian Müller <44298237+gedoensmax@users.noreply.github.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Wanming Lin <wanming.lin@intel.com> Co-authored-by: quic-calvnguy <quic_calvnguy@quicinc.com> Co-authored-by: Jie Chen <jie.a.chen@intel.com> Co-authored-by: xhcao <xinghua.cao@intel.com> Co-authored-by: Wei-Sheng Chin <wschin@outlook.com> Co-authored-by: quic-hungjuiw <quic_hungjuiw@quicinc.com> Co-authored-by: Ian Hunter <ianfhunter@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Jeff Kilpatrick <jkilpatrick@qti.qualcomm.com> Co-authored-by: Jeff Kilpatrick <jkilpat@qti.qualcomm.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Nenad Banfic <46795300+nenad1002@users.noreply.github.com> Co-authored-by: derdeljan-msft <derdeljan@microsoft.com> Co-authored-by: n1harika <niharika.sathish@intel.com> Co-authored-by: Ryan Metcalfe <ryan.metcalfe@intel.com> Co-authored-by: Jaswanth Gannamaneni <jaswanth.gannamaneni@intel.com> Co-authored-by: Klimenko, Mikhail <mikhail.klimenko@intel.com> Co-authored-by: liang <gxgaoliang@126.com> Co-authored-by: Garth Long <garth.long@intel.com> Co-authored-by: Jonathan Clohessy <jonathan.clohessy@arm.com> Co-authored-by: Akshay Sonawane <111780983+apsonawane@users.noreply.github.com> Co-authored-by: Christopher Warrington <chwarr@microsoft.com> Co-authored-by: Ishwar Raut <iraut@nvidia.com> Co-authored-by: Gaurav Garg <gaugarg@nvidia.com> Co-authored-by: Xinpeng Dou <15529241576@163.com> Co-authored-by: adrastogi <aditya.rastogi@microsoft.com> Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com> Co-authored-by: qti-hungjuiw <hungjuiw@qti.qualcomm.com> Co-authored-by: Pradeep Sakhamoori <psakhamoori@microsoft.com> Co-authored-by: Adam Pocock <adam.pocock@oracle.com> Co-authored-by: mingyue <131847423+mingyueliuh@users.noreply.github.com> Co-authored-by: Susanta Bhattacharjee <susanta.bhattacharjee@intel.com> Co-authored-by: Jozef Wludzik <jozef.wludzik@intel.com> Co-authored-by: Rajeev Sekar <rajeevsekar21@gmail.com> Co-authored-by: Mayuresh M Varerkar <mayuresh.m.varerkar@intel.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Wenqin Yang <wenqin.yang@intel.com> Co-authored-by: xieofxie <xieofxie@126.com> Co-authored-by: hualxie <hualxie@microsoft.com> Co-authored-by: Joshua Lochner <admin@xenova.com> Co-authored-by: Christian Bourjau <cbourjau@users.noreply.github.com> Co-authored-by: Xiaofei Han <xiaofeihan@microsoft.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: chunghow-qti <chunghow@qti.qualcomm.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Jiawei Shao <jiawei.shao@intel.com> Co-authored-by: czekun <chen.zekun@intel.com> Co-authored-by: Jaskaran Singh Nagi <jaskaran.singh.nagi@intel.com> Co-authored-by: quic-tirupath <quic_tirupath@quicinc.com> Co-authored-by: qti-mattsinc <mattsinc@qti.qualcomm.com> Co-authored-by: Melike Kaptan <melike.kaptan@arm.com> Co-authored-by: Damien Dooley <damien.dooley@arm.com> Co-authored-by: qti-vaiskv <vaiskv@qti.qualcomm.com> Co-authored-by: Rohanjames1997 <rohan.james4@gmail.com> Co-authored-by: eserscor <erscor@microsoft.com> Co-authored-by: eserscor <247253654+eserscor@users.noreply.github.com> Co-authored-by: Nick Eubanks <nieubank@microsoft.com> Co-authored-by: adrastogi <8368026+adrastogi@users.noreply.github.com> Co-authored-by: Rohanjames1997 <rohanjms@amazon.com>
1 parent 3874516 commit fe30e5c

File tree

189 files changed

+14410
-1658
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

189 files changed

+14410
-1658
lines changed

.github/workflows/mac.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ jobs:
7272
matrix:
7373
target_arch: [x86_64, arm64]
7474

75-
timeout-minutes: 90
75+
timeout-minutes: 120
7676

7777
steps:
7878
- name: Checkout code

.gitmodules

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@
77
[submodule "cmake/external/emsdk"]
88
path = cmake/external/emsdk
99
url = https://github.com/emscripten-core/emsdk.git
10-
branch = 4.0.21
10+
branch = 4.0.23

cmake/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ option(onnxruntime_USE_SVE "Build with SVE support in MLAS" OFF)
9191
option(onnxruntime_USE_ARM_NEON_NCHWC "Build with ARM Neon NCHWc kernels in MLAS" OFF)
9292

9393
option(onnxruntime_USE_KLEIDIAI "Build with KleidiAI integration in MLAS" OFF)
94+
option(onnxruntime_USE_QMX_KLEIDIAI_COEXIST "Build with QMX and Arm KLEIDIAI libraries" OFF)
9495
option(onnxruntime_BUILD_UNIT_TESTS "Build ONNXRuntime unit tests" ON)
9596
option(onnxruntime_BUILD_CSHARP "Build C# library" OFF)
9697
option(onnxruntime_BUILD_OBJC "Build Objective-C library" OFF)

cmake/deps.txt

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,5 +56,8 @@ extensions;https://github.com/microsoft/onnxruntime-extensions/archive/c24b7bab0
5656
directx_headers;https://github.com/microsoft/DirectX-Headers/archive/refs/tags/v1.613.1.zip;47653509a3371eabb156360f42faf582f314bf2e
5757
cudnn_frontend;https://github.com/NVIDIA/cudnn-frontend/archive/refs/tags/v1.12.0.zip;7e733cfdc410d777b76122d64232499205589a96
5858
dawn;https://github.com/google/dawn/archive/13c1635a14574ebb7116b56a69f5519301417fda.zip;0aadd28fc385cf7d657d5fc70a352372d2d3c76a
59-
kleidiai;https://github.com/ARM-software/kleidiai/archive/refs/tags/v1.15.0.tar.gz;62ccd24ab60bcef68766440fb42d79071ac2a5d2
59+
kleidiai;https://github.com/ARM-software/kleidiai/archive/refs/tags/v1.20.0.tar.gz;6895e72b3d5cf1173358164cb3d64c9d7d33cc84
60+
# kleidiai-qmx is pinned to a specific commit as there are no tagged releases. When an appropriate tagged release becomes available,
61+
# this entry will be updated to use refs/tags/<version> instead of the raw commit hash.
62+
kleidiai-qmx;https://github.com/qualcomm/kleidiai/archive/2f10c9a8d32f81ffeeb6d4885a29cc35d2b0da87.zip;5e855730a2d69057a569f43dd7532db3b2d2a05c
6063
duktape;https://github.com/svaarala/duktape/releases/download/v2.7.0/duktape-2.7.0.tar.xz;8200c8e417dbab7adcc12c4dbdef7651cfc55794

cmake/external/onnxruntime_external_deps.cmake

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -845,6 +845,12 @@ if(onnxruntime_USE_KLEIDIAI)
845845

846846
onnxruntime_fetchcontent_declare(kleidiai URL ${DEP_URL_kleidiai} URL_HASH SHA1=${DEP_SHA1_kleidiai} EXCLUDE_FROM_ALL)
847847
onnxruntime_fetchcontent_makeavailable(kleidiai)
848+
# Fetch Qualcomm's kleidiai library
849+
if(onnxruntime_USE_QMX_KLEIDIAI_COEXIST)
850+
onnxruntime_fetchcontent_declare(kleidiai-qmx URL ${DEP_URL_kleidiai-qmx} URL_HASH SHA1=${DEP_SHA1_kleidiai-qmx}
851+
EXCLUDE_FROM_ALL)
852+
onnxruntime_fetchcontent_makeavailable(kleidiai-qmx)
853+
endif()
848854
endif()
849855

850856
set(onnxruntime_LINK_DIRS)

cmake/onnxruntime_mlas.cmake

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ onnxruntime_add_static_library(onnxruntime_mlas
4545
${MLAS_SRC_DIR}/qdwconv_kernelsize.cpp
4646
${MLAS_SRC_DIR}/qnbitgemm.h
4747
${MLAS_SRC_DIR}/qnbitgemm.cpp
48+
${MLAS_SRC_DIR}/qlutgemm.h
49+
${MLAS_SRC_DIR}/qlutgemm.cpp
4850
${MLAS_SRC_DIR}/sqnbitgemm_q8_block.h
4951
${MLAS_SRC_DIR}/flashattn.cpp
5052
${MLAS_SRC_DIR}/cast.cpp
@@ -113,6 +115,7 @@ function(setup_mlas_source_for_windows)
113115
${MLAS_SRC_DIR}/eltwise_kernel_neon.cpp
114116
${MLAS_SRC_DIR}/eltwise_kernel_neon_fp16.cpp
115117
${MLAS_SRC_DIR}/sqnbitgemm_kernel_neon_int8_i8mm.cpp
118+
${MLAS_SRC_DIR}/sconv_nchw_kernel_neon.cpp
116119
)
117120

118121
set(mlas_platform_preprocess_srcs
@@ -209,6 +212,8 @@ function(setup_mlas_source_for_windows)
209212
${MLAS_SRC_DIR}/qgemm_kernel_sse.cpp
210213
${MLAS_SRC_DIR}/qgemm_kernel_sse41.cpp
211214
${MLAS_SRC_DIR}/intrinsics/avx512/quantize_avx512f.cpp
215+
${MLAS_SRC_DIR}/sqnbitgemm_lut_kernel_avx2.h
216+
${MLAS_SRC_DIR}/sqnbitgemm_lut_kernel_avx2.cpp
212217
${MLAS_SRC_DIR}/sqnbitgemm_kernel_avx2.cpp
213218
${MLAS_SRC_DIR}/sqnbitgemm_kernel_avx512.cpp
214219
${MLAS_SRC_DIR}/sqnbitgemm_kernel_avx512vnni.cpp
@@ -284,6 +289,11 @@ function(setup_kleidiai)
284289
)
285290
target_link_libraries(onnxruntime_mlas PRIVATE kleidiai)
286291
list(APPEND onnxruntime_EXTERNAL_LIBRARIES kleidiai)
292+
if(onnxruntime_USE_QMX_KLEIDIAI_COEXIST)
293+
target_link_libraries(onnxruntime_mlas PRIVATE kleidiai-qmx)
294+
target_compile_definitions(onnxruntime_mlas PRIVATE ENABLE_QMX_KERNELS=1)
295+
list(APPEND onnxruntime_EXTERNAL_LIBRARIES kleidiai-qmx)
296+
endif()
287297
set(onnxruntime_EXTERNAL_LIBRARIES ${onnxruntime_EXTERNAL_LIBRARIES} PARENT_SCOPE)
288298

289299
# If KLEIDIAI_DEBUG is enabled that implies both DEBUG and KERNEL messages.
@@ -302,13 +312,21 @@ function(setup_kleidiai)
302312
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
303313
FRAMEWORK DESTINATION ${CMAKE_INSTALL_BINDIR})
304314
endif()
315+
316+
if(onnxruntime_USE_QMX_KLEIDIAI_COEXIST)
317+
install(TARGETS kleidiai-qmx EXPORT ${PROJECT_NAME}Targets
318+
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
319+
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
320+
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
321+
FRAMEWORK DESTINATION ${CMAKE_INSTALL_BINDIR})
322+
endif()
305323
endfunction()
306324

307325
function (setup_arm_neon_nchwc)
308326
target_sources(onnxruntime_mlas PRIVATE
309-
${MLAS_SRC_DIR}/sconv.h
310-
${MLAS_SRC_DIR}/sconv_kernel_neon.cpp
311-
${MLAS_SRC_DIR}/spool_kernel_neon.cpp
327+
${MLAS_SRC_DIR}/sconv_nchwc_kernel_neon.h
328+
${MLAS_SRC_DIR}/sconv_nchwc_kernel_neon.cpp
329+
${MLAS_SRC_DIR}/spool_nchwc_kernel_neon.cpp
312330
)
313331
list(APPEND mlas_private_compile_definitions MLAS_USE_ARM_NEON_NCHWC)
314332
set(mlas_private_compile_definitions ${mlas_private_compile_definitions} PARENT_SCOPE)
@@ -460,6 +478,7 @@ else()
460478
${MLAS_SRC_DIR}/eltwise_kernel_neon.h
461479
${MLAS_SRC_DIR}/eltwise_kernel_neon.cpp
462480
${MLAS_SRC_DIR}/sqnbitgemm_kernel_neon_int8_i8mm.cpp
481+
${MLAS_SRC_DIR}/sconv_nchw_kernel_neon.cpp
463482
)
464483

465484
# Conditionally add the SVE implementation if compiler supports it
@@ -496,6 +515,7 @@ else()
496515
${MLAS_SRC_DIR}/qgemm_kernel_smmla.cpp
497516
${MLAS_SRC_DIR}/qgemm_kernel_ummla.cpp
498517
${MLAS_SRC_DIR}/sbgemm_kernel_neon.cpp
518+
${MLAS_SRC_DIR}/sbconv_kernel_neon.cpp
499519
${MLAS_SRC_DIR}/cast_kernel_neon.cpp
500520
${MLAS_SRC_DIR}/hqnbitgemm_kernel_neon_fp16.cpp
501521
${MLAS_SRC_DIR}/rotary_embedding_kernel_neon_fp16.cpp
@@ -511,6 +531,7 @@ else()
511531
set_source_files_properties(${MLAS_SRC_DIR}/dwconv.cpp PROPERTIES COMPILE_FLAGS " -march=armv8.2-a+fp16 ")
512532
set_source_files_properties(${MLAS_SRC_DIR}/pooling_fp16.cpp PROPERTIES COMPILE_FLAGS " -march=armv8.2-a+fp16 ")
513533
set_source_files_properties(${MLAS_SRC_DIR}/sbgemm_kernel_neon.cpp PROPERTIES COMPILE_FLAGS " -march=armv8.2-a+bf16 ")
534+
set_source_files_properties(${MLAS_SRC_DIR}/sbconv_kernel_neon.cpp PROPERTIES COMPILE_FLAGS " -march=armv8.2-a+bf16 ")
514535
set_source_files_properties(${MLAS_SRC_DIR}/cast_kernel_neon.cpp PROPERTIES COMPILE_FLAGS " -march=armv8.2-a+fp16 ")
515536
set_source_files_properties(${MLAS_SRC_DIR}/hqnbitgemm_kernel_neon_fp16.cpp PROPERTIES COMPILE_FLAGS " -march=armv8.2-a+fp16 ")
516537
set_source_files_properties(${MLAS_SRC_DIR}/rotary_embedding_kernel_neon_fp16.cpp PROPERTIES COMPILE_FLAGS " -march=armv8.2-a+fp16 ")
@@ -693,6 +714,8 @@ else()
693714
${MLAS_SRC_DIR}/intrinsics/avx2/qdwconv_avx2.cpp
694715
${MLAS_SRC_DIR}/intrinsics/avx2/saturation_check_avx2.cpp
695716
${MLAS_SRC_DIR}/sqnbitgemm_kernel_avx2.cpp
717+
${MLAS_SRC_DIR}/sqnbitgemm_lut_kernel_avx2.h
718+
${MLAS_SRC_DIR}/sqnbitgemm_lut_kernel_avx2.cpp
696719
${MLAS_SRC_DIR}/rotary_embedding_kernel_avx2.h
697720
${MLAS_SRC_DIR}/rotary_embedding_kernel_avx2.cpp
698721
${MLAS_SRC_DIR}/rotary_embedding_kernel_avx2.cpp

0 commit comments

Comments
 (0)