Skip to content

Enables Matmul and Gemm for float16 on CPU #5468

Enables Matmul and Gemm for float16 on CPU

Enables Matmul and Gemm for float16 on CPU #5468

Triggered via pull request September 1, 2025 16:25
Status Success
Total duration 1h 24m 54s
Artifacts 1

linux_tensorrt_ci.yml

on: pull_request
Build Linux TensorRT x64 Release  /  build_test_pipeline
49m 45s
Build Linux TensorRT x64 Release / build_test_pipeline
Test Linux TensorRT x64 Release
25m 48s
Test Linux TensorRT x64 Release
Fit to window
Zoom out
Zoom in

Annotations

8 warnings
Build Linux TensorRT x64 Release / build_test_pipeline
Wheel output directory /mnt/vss/_work/_temp/Release/dist does not exist.
Build Linux TensorRT x64 Release / build_test_pipeline
stderr: + PATH=/opt/python/cp310-cp310/bin:/usr/local/dotnet:/usr/lib/jvm/msopenjdk-17/bin:/opt/rh/gcc-toolset-12/root/usr/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin + python3 -m pip install --user -r tools/ci_build/github/linux/python/requirements.txt [notice] A new release of pip is available: 24.3.1 -> 25.2 [notice] To update, run: pip install --upgrade pip + python3 tools/ci_build/build.py --build_dir build/Release --config Release --cmake_generator Ninja --skip_submodule_sync --build_shared_lib --parallel --use_vcpkg --use_vcpkg_ms_internal_asset_cache --enable_onnx_tests --use_cuda --use_tensorrt --use_binskim_compliant_compile_flags --build_wheel --cuda_version=12.2 --cuda_home=/usr/local/cuda-12.2 --cudnn_home=/usr/local/cuda-12.2 --use_tensorrt --tensorrt_home /usr --build_java --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=90 onnxruntime_BUILD_UNIT_TESTS=ON onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON --build 2025-09-01 16:46:40,806 build [DEBUG] - Command line arguments: --build_dir build/Release --config Release --cmake_generator Ninja --skip_submodule_sync --build_shared_lib --parallel --use_vcpkg --use_vcpkg_ms_internal_asset_cache --enable_onnx_tests --use_cuda --use_tensorrt --use_binskim_compliant_compile_flags --build_wheel --cuda_version=12.2 --cuda_home=/usr/local/cuda-12.2 --cudnn_home=/usr/local/cuda-12.2 --use_tensorrt --tensorrt_home /usr --build_java --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=90 onnxruntime_BUILD_UNIT_TESTS=ON onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON --build 2025-09-01 16:46:40,811 build [INFO] - Build started 2025-09-01 16:46:40,811 build [INFO] - Building targets for Release configuration 2025-09-01 16:46:40,811 build [INFO] - /usr/bin/cmake --build build/Release/Release --config Release -- -j16 2025-09-01 17:17:23,557 build [INFO] - /opt/python/cp310-cp310/bin/python3 /onnxruntime_src/setup.py bdist_wheel --nightly_build --wheel_name_suffix=gpu --cuda_version=12.2 /opt/python/cp310-cp310/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated. !! ******************************************************************************** Please avoid running ``setup.py`` directly. Instead, use pypa/build, pypa/installer or other standards-based tools. See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details. ******************************************************************************** !! self.initialize_options() DEBUG:auditwheel.musllinux:musl libc not detected DEBUG:auditwheel.libc:Falling back to GNU libc INFO:auditwheel.main_repair:Repairing onnxruntime_gpu-1.23.0.dev20250901-cp310-cp310-linux_x86_64.whl DEBUG:auditwheel.wheel_abi:processing: onnxruntime/capi/libonnxruntime.so.1.23.0 DEBUG:auditwheel.musllinux:musl libc not detected DEBUG:auditwheel.libc:Falling back to GNU libc DEBUG:auditwheel.lddtree:parse_ld_so_conf(//etc/ld.so.conf) DEBUG:auditwheel.lddtree: glob: //etc/ld.so.conf.d/*.conf DEBUG:auditwheel.lddtree: parse_ld_so_conf(//etc/ld.so.conf.d/00-manylinux.conf) DEBUG:auditwheel.lddtree: parse_ld_so_conf(//etc/ld.so.conf.d/gds-12-2.conf) DEBUG:auditwheel.lddtree: parse_ld_so_conf(//etc/ld.so.conf.d/nvidia.conf) DEBUG:auditwheel.lddtree: parse_ld_so_conf(//etc/ld.so.conf.d/000_cuda.conf) DEBUG:auditwheel.lddtree: parse_ld_so_conf(//etc/ld.so.conf.d/988_cuda-12.conf) DEBUG:auditwheel.lddtree:linker ldpaths: {'conf': ['/usr/local/lib', '/usr/local/cuda-12.2/targets/x86_64-linux/lib', '/usr/local/cuda/targets/x86_64-linux/lib', '/usr/local/cuda-12/targets/x86_64-linux/lib', '/lib', '/lib64/', '/usr/lib', '/usr/lib64'], 'env': ['/opt/rh/gcc-toolset-12/root/usr/lib64', '/opt/rh/gcc-toolset-12/root/usr/lib', '/usr/local/lib64'], 'interp': []} DEBUG:auditwheel.lddtree:lddtree(onnxruntime/capi/libonnxruntime.so.1.23.0) DEBUG:auditwheel.lddtree: ldpaths[rpath] = [] DEBUG:auditwheel.lddtree: ldpaths[runpath] = ['/tmp/tmpvd9fkb_c/onnxruntime/capi'] DEBUG:a
Build Linux TensorRT x64 Release / build_test_pipeline
Wheel output directory /mnt/vss/_work/_temp/Release/dist does not exist.
Build Linux TensorRT x64 Release / build_test_pipeline
stderr: + PATH=/opt/python/cp310-cp310/bin:/usr/local/dotnet:/usr/lib/jvm/msopenjdk-17/bin:/opt/rh/gcc-toolset-12/root/usr/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin + python3 -m pip install --user -r tools/ci_build/github/linux/python/requirements.txt [notice] A new release of pip is available: 24.3.1 -> 25.2 [notice] To update, run: pip install --upgrade pip + python3 tools/ci_build/build.py --build_dir build/Release --config Release --cmake_generator Ninja --skip_submodule_sync --build_shared_lib --parallel --use_vcpkg --use_vcpkg_ms_internal_asset_cache --enable_onnx_tests --use_cuda --use_tensorrt --use_binskim_compliant_compile_flags --build_wheel --cuda_version=12.2 --cuda_home=/usr/local/cuda-12.2 --cudnn_home=/usr/local/cuda-12.2 --use_tensorrt --tensorrt_home /usr --build_java --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=90 onnxruntime_BUILD_UNIT_TESTS=ON onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON --update 2025-09-01 16:42:15,755 build [DEBUG] - Command line arguments: --build_dir build/Release --config Release --cmake_generator Ninja --skip_submodule_sync --build_shared_lib --parallel --use_vcpkg --use_vcpkg_ms_internal_asset_cache --enable_onnx_tests --use_cuda --use_tensorrt --use_binskim_compliant_compile_flags --build_wheel --cuda_version=12.2 --cuda_home=/usr/local/cuda-12.2 --cudnn_home=/usr/local/cuda-12.2 --use_tensorrt --tensorrt_home /usr --build_java --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=90 onnxruntime_BUILD_UNIT_TESTS=ON onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON --update 2025-09-01 16:42:15,759 build [INFO] - Build started 2025-09-01 16:42:15,759 build [INFO] - Generating CMake build tree 2025-09-01 16:42:15,777 build [INFO] - /usr/bin/cmake /onnxruntime_src/cmake -Donnxruntime_ENABLE_EXTERNAL_CUSTOM_OP_SCHEMAS=OFF -Donnxruntime_RUN_ONNX_TESTS=ON -Donnxruntime_GENERATE_TEST_REPORTS=ON -DPython_EXECUTABLE=/opt/python/cp310-cp310/bin/python3 -Donnxruntime_USE_VCPKG=ON -Donnxruntime_USE_MIMALLOC=OFF -Donnxruntime_ENABLE_PYTHON=ON -Donnxruntime_BUILD_CSHARP=OFF -Donnxruntime_BUILD_JAVA=ON -Donnxruntime_BUILD_NODEJS=OFF -Donnxruntime_BUILD_OBJC=OFF -Donnxruntime_BUILD_SHARED_LIB=ON -Donnxruntime_BUILD_APPLE_FRAMEWORK=OFF -Donnxruntime_USE_DNNL=OFF -Donnxruntime_USE_NNAPI_BUILTIN=OFF -Donnxruntime_USE_VSINPU=OFF -Donnxruntime_USE_RKNPU=OFF -Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF -Donnxruntime_USE_VITISAI=OFF -Donnxruntime_USE_TENSORRT=ON -Donnxruntime_USE_NV=OFF -Donnxruntime_USE_TENSORRT_BUILTIN_PARSER=ON -Donnxruntime_USE_TENSORRT_INTERFACE=OFF -Donnxruntime_USE_CUDA_INTERFACE=OFF -Donnxruntime_USE_NV_INTERFACE=OFF -Donnxruntime_USE_OPENVINO_INTERFACE=OFF -Donnxruntime_USE_VITISAI_INTERFACE=OFF -Donnxruntime_USE_QNN_INTERFACE=OFF -Donnxruntime_USE_MIGRAPHX_INTERFACE=OFF -Donnxruntime_USE_MIGRAPHX=OFF -Donnxruntime_DISABLE_CONTRIB_OPS=OFF -Donnxruntime_DISABLE_ML_OPS=OFF -Donnxruntime_DISABLE_RTTI=OFF -Donnxruntime_DISABLE_EXCEPTIONS=OFF -Donnxruntime_MINIMAL_BUILD=OFF -Donnxruntime_EXTENDED_MINIMAL_BUILD=OFF -Donnxruntime_MINIMAL_BUILD_CUSTOM_OPS=OFF -Donnxruntime_REDUCED_OPS_BUILD=OFF -Donnxruntime_CLIENT_PACKAGE_BUILD=OFF -Donnxruntime_BUILD_MS_EXPERIMENTAL_OPS=OFF -Donnxruntime_ENABLE_LTO=OFF -Donnxruntime_USE_ACL=OFF -Donnxruntime_USE_ARMNN=OFF -Donnxruntime_ARMNN_RELU_USE_CPU=ON -Donnxruntime_ARMNN_BN_USE_CPU=ON -Donnxruntime_USE_JSEP=OFF -Donnxruntime_USE_WEBGPU=OFF -Donnxruntime_USE_EXTERNAL_DAWN=OFF -Donnxruntime_WGSL_TEMPLATE=static -Donnxruntime_ENABLE_NVTX_PROFILE=OFF -Donnxruntime_ENABLE_TRAINING=OFF -Donnxruntime_ENABLE_TRAINING_OPS=OFF -Donnxruntime_ENABLE_TRAINING_APIS=OFF -Donnxruntime_ENABLE_CPU_FP16_OPS=OFF -Donnxruntime_USE_NCCL=OFF -Donnxruntime_BUILD_BENCHMARKS=OFF -Donnxruntime_GCOV_COVERAGE=OFF -Donnxruntime_ENABLE_MEMORY_PROFILE=OFF -Donnxruntime_ENABLE_CUDA_LINE_NUMBER_INFO=OFF -Donnxruntime_USE_CUDA_NHWC_OPS=ON -Donnxruntime_BUILD_WEBASSEMBLY_STATIC_LIB=OFF -Donnxruntime_ENABLE_WEBASSEMBLY_EXCEPTION_CATCHING=ON -Donnxruntime_ENABLE_WEBASSEMBLY_API_EXCEPTION_CATCHING=OFF -Donnxruntime_ENABLE_
Build Linux TensorRT x64 Release / build_test_pipeline
stderr: WARNING! Your credentials are stored unencrypted in '/home/cloudtest/.docker/config.json'. Configure a credential helper to remove this warning. See https://docs.docker.com/go/credential-store/
Test Linux TensorRT x64 Release
Wheel output directory /mnt/vss/_work/_temp/Release/dist does not exist.
Test Linux TensorRT x64 Release
stderr: + PATH=/opt/python/cp310-cp310/bin:/usr/local/dotnet:/usr/lib/jvm/msopenjdk-17/bin:/opt/rh/gcc-toolset-12/root/usr/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin + python3 -m pip install --user -r tools/ci_build/github/linux/python/requirements.txt [notice] A new release of pip is available: 24.3.1 -> 25.2 [notice] To update, run: pip install --upgrade pip + python3 tools/ci_build/build.py --build_dir build/Release --config Release --cmake_generator Ninja --skip_submodule_sync --build_shared_lib --parallel --use_vcpkg --use_vcpkg_ms_internal_asset_cache --enable_onnx_tests --use_cuda --use_tensorrt --use_binskim_compliant_compile_flags --build_wheel --cuda_version=12.2 --cuda_home=/usr/local/cuda-12.2 --cudnn_home=/usr/local/cuda-12.2 --use_tensorrt --tensorrt_home /usr --build_java --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=90 onnxruntime_BUILD_UNIT_TESTS=ON onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON --test 2025-09-01 17:34:57,237 build [DEBUG] - Command line arguments: --build_dir build/Release --config Release --cmake_generator Ninja --skip_submodule_sync --build_shared_lib --parallel --use_vcpkg --use_vcpkg_ms_internal_asset_cache --enable_onnx_tests --use_cuda --use_tensorrt --use_binskim_compliant_compile_flags --build_wheel --cuda_version=12.2 --cuda_home=/usr/local/cuda-12.2 --cudnn_home=/usr/local/cuda-12.2 --use_tensorrt --tensorrt_home /usr --build_java --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=90 onnxruntime_BUILD_UNIT_TESTS=ON onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON --test 2025-09-01 17:34:57,241 build [INFO] - Build started 2025-09-01 17:34:57,241 build [DEBUG] - create symlink /data/models -> build/Release/models 2025-09-01 17:34:57,241 build [INFO] - Running tests for Release configuration 2025-09-01 17:34:57,241 build [INFO] - /usr/bin/ctest --build-config Release --verbose --timeout 10800 2025-09-01 17:45:49,389 build [INFO] - /opt/python/cp310-cp310/bin/python3 onnxruntime_test_python.py ......2025-09-01 17:45:58.198000265 [W:onnxruntime:, inference_session.cc:3527 SetTuningResults] Cannot find execution provider UnknownEP�[m 2025-09-01 17:45:58.198193595 [W:onnxruntime:, inference_session.cc:3543 SetTuningResults] Failed to load TuningResults (index=0). Reason: tuning_context_impl.h:167 CheckMandatoryKeys key="ORT_VERSION" is not provided for validation. �[m 2025-09-01 17:45:58.198286755 [W:onnxruntime:, inference_session.cc:3543 SetTuningResults] Failed to load TuningResults (index=0). Reason: tuning_context_impl.h:204 CheckKeysMatching Unmatched validator: "NOT_A_VALIDATOR_KEY" is provided, but onnxruntime is unable to consume it. �[m 2025-09-01 17:45:58.198358965 [W:onnxruntime:, inference_session.cc:3543 SetTuningResults] Failed to load TuningResults (index=0). Reason: tuning_context_impl.h:213 ValidateOrtVersion onnxruntime version mismatch�[m .....Unsupported ONNX data type: STRING (8) 2025-09-01 17:46:06.745503685 [E:onnxruntime:Default, tensorrt_execution_provider.h:90 log] [2025-09-01 17:46:06 ERROR] In node -1 with name: and operator: (importInput): UNSUPPORTED_NODE: Assertion failed: convertDtype(onnxDtype.elem_type(), &trtDtype) && "Failed to convert ONNX date type to TensorRT data type."�[m 2025-09-01 17:46:06.745618145 [W:onnxruntime:Default, tensorrt_execution_provider.cc:2830 GetCapability] [TensorRT EP] No graph will run on TensorRT execution provider�[m .Unsupported ONNX data type: STRING (8) 2025-09-01 17:46:09.880614010 [E:onnxruntime:Default, tensorrt_execution_provider.h:90 log] [2025-09-01 17:46:09 ERROR] In node -1 with name: and operator: (importInput): UNSUPPORTED_NODE: Assertion failed: convertDtype(onnxDtype.elem_type(), &trtDtype) && "Failed to convert ONNX date type to TensorRT data type."�[m 2025-09-01 17:46:09.880740430 [W:onnxruntime:Default, tensorrt_execution_provider.cc:2830 GetCapability] [TensorRT EP] No graph will run on TensorRT execution provider�[m .Unsupported ONNX data type: STRING (8) 2025-09-01 17:46:
Test Linux TensorRT x64 Release
stderr: WARNING! Your credentials are stored unencrypted in '/home/cloudtest/.docker/config.json'. Configure a credential helper to remove this warning. See https://docs.docker.com/go/credential-store/

Artifacts

Produced during runtime
Name Size Digest
build-output-x64-Release
1.24 GB
sha256:bae21ec8edf6f99c66ead6f52c5bdddb69c81d8bcc91d9318fef379964d0a345