Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having issue installing FBGEMM-gpu on MacOS #2248

Open
justin8shan opened this issue Jan 4, 2024 · 8 comments
Open

Having issue installing FBGEMM-gpu on MacOS #2248

justin8shan opened this issue Jan 4, 2024 · 8 comments

Comments

@justin8shan
Copy link

Hello, I have problem to install FBGEMM-gpu for torchREC

Initially I tried to install using pip but not successful.

(torchrec) ➜  torchrec git:(main) ✗ pip install fbgemm-gpu --index-url https://download.pytorch.org/whl/cpu

Looking in indexes: https://download.pytorch.org/whl/cpu
Requirement already satisfied: pip in /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages (23.3.1)
ERROR: Could not find a version that satisfies the requirement install (from versions: none)
ERROR: No matching distribution found for install

Then, tried to build the package by following instruction on https://pytorch.org/FBGEMM/general/BuildInstructions.html#fbgemm-gpu-docs-build-setup-tools-install to install cpu-only FBGEMM-gpu.
Executed below command

 python setup.py bdist_wheel \
    --package_name="${package_name}" \
    --package_variant=cpu \
    --python-tag="${python_tag}" \
    --plat-name="-13.6-${ARCH}"

I received Clang error:

[SETUP.PY] Parsed Arguments: Namespace(verbose=False, package_variant='cpu', package_name='fbgemm_gpu_cpu', nvml_lib_path=None)
[SETUP.PY] Unknown Arguments: ['bdist_wheel', '--python-tag=py310', '--plat-name=-13.6-x86_64']
[SETUP.PY] Extracted the package name: 'fbgemm_gpu_cpu'
[SETUP.PY] Not building FBGEMM_GPU from Nova.
[SETUP.PY] Extracted the package variant+version: ''
[SETUP.PY] Generating the package version ...
[SETUP.PY] Package is for RELEASE: using git info for the versioning
[SETUP.PY] TAG: v0.6.0-rc0, BRANCH: main, SHA: 441697c0481f82b9c328d39f70e4b34fdc890758
[SETUP.PY] Setting the full package version string: 0.6.0rc0.post26
[SETUP.PY] Generating version file at: /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/fbgemm_gpu/docs/version.py
[SETUP.PY] Building the CPU-ONLY variant of FBGEMM_GPU ...
-13.6-x86_64
[1/52] Building CXX object CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_forward_quantized_host_cpu.cpp.o
FAILED: CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_forward_quantized_host_cpu.cpp.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_py_EXPORTS -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/asmjit/src -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/cpuinfo/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -D_GLIBCXX_USE_CXX11_ABI=0 -DNO_AVX512=1 -O3 -DNDEBUG -std=c++17 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk -mmacosx-version-min=13.6 -fPIC -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_forward_quantized_host_cpu.cpp.o -MF CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_forward_quantized_host_cpu.cpp.o.d -o CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_forward_quantized_host_cpu.cpp.o -c /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_forward_quantized_host_cpu.cpp
clang: error: unsupported option '-fopenmp'

Any advice how to solve this issue

MasOS: Ventura
Processor: Intel Core i7

@excelle08
Copy link

Hi, FBGEMM-GPU currently needs to be built with GCC and does not support clang. Can you try installing gcc and g++ and running the setup again?

@justin8shan
Copy link
Author

I did. I had gcc installed with brew and exported CC/CXX. but still have the same problem

(base) ➜  gitrepo export CC=/usr/local/Cellar/gcc/13.2.0/bin/gcc-13
(base) ➜  gitrepo export CXX=/usr/local/Cellar/gcc/13.2.0/bin/g++-13

@q10
Copy link
Contributor

q10 commented Jan 5, 2024

Hi @justin8shan, we currently do not officially support FBGEMM_GPU-CPU on MacOS, so there is no guarantee that the code will build. That being said, you might not be passing in CC and CXX correctly - could you try:

 python setup.py bdist_wheel \
    --package_name="${package_name}" \
    --package_variant=cpu \
    --python-tag="${python_tag}" \
    --plat-name="macosx-13.6-${ARCH}" \
    -DCMAKE_C_COMPILER="/usr/local/Cellar/gcc/13.2.0/bin/gcc-13" \
    -DCMAKE_CXX_COMPILER="/usr/local/Cellar/gcc/13.2.0/bin/g++-13"

and show us the full build logs?

@justin8shan
Copy link
Author

justin8shan commented Jan 5, 2024

@q10 Thanks for the advice. Is there a plan to support MacOS deployment for local build?

With your suggestion, the above error is gone but I got other error saying libtorch missing. Then I downloaded libtorch-macos-latest.zip and copied to /usr/local/Cellar/libtorch. After running again, I got more other errors as below. Can you help?

Here is the build log.

(torchrec) ➜  fbgemm_gpu git:(main) ✗ python setup.py bdist_wheel \
    --package_name="${package_name}" \
    --package_variant=cpu \
    --python-tag="${python_tag}" \
    --plat-name="macosx-13.6-${ARCH}" \
  -DCMAKE_C_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/gcc-13 \
    -DCMAKE_CXX_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/g++-13
['setup.py', 'bdist_wheel', '--package_name=fbgemm_gpu_cpu', '--package_variant=cpu', '--python-tag=py310', '--plat-name=macosx-13.6-x86_64', '-DCMAKE_C_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/gcc-13', '-DCMAKE_CXX_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/g++-13']
[SETUP.PY] Parsed Arguments: Namespace(verbose=False, package_variant='cpu', package_name='fbgemm_gpu_cpu', nvml_lib_path=None)
[SETUP.PY] Unknown Arguments: ['bdist_wheel', '--python-tag=py310', '--plat-name=macosx-13.6-x86_64', '-DCMAKE_C_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/gcc-13', '-DCMAKE_CXX_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/g++-13']
[SETUP.PY] Extracted the package name: 'fbgemm_gpu_cpu'
[SETUP.PY] Not building FBGEMM_GPU from Nova.
[SETUP.PY] Extracted the package variant+version: ''
[SETUP.PY] Generating the package version ...
[SETUP.PY] Package is for RELEASE: using git info for the versioning
[SETUP.PY] TAG: v0.6.0-rc0, BRANCH: main, SHA: 441697c0481f82b9c328d39f70e4b34fdc890758
[SETUP.PY] Setting the full package version string: 0.6.0rc0.post26
[SETUP.PY] Generating version file at: /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/fbgemm_gpu/docs/version.py
[SETUP.PY] Building the CPU-ONLY variant of FBGEMM_GPU ...
macosx-13.6-x86_64
[4/35] Building CXX object CMakeFiles/fbgemm_gpu_py.dir/src/sparse_ops/sparse_ops_cpu.cpp.o
FAILED: CMakeFiles/fbgemm_gpu_py.dir/src/sparse_ops/sparse_ops_cpu.cpp.o
/usr/local/Cellar/gcc/13.2.0/bin/g++-13 -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_py_EXPORTS -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/asmjit/src -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/cpuinfo/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -D_GLIBCXX_USE_CXX11_ABI=0 -DNO_AVX512=1 -O3 -DNDEBUG -std=c++17 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk -mmacosx-version-min=13.6 -fPIC -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm_gpu_py.dir/src/sparse_ops/sparse_ops_cpu.cpp.o -MF CMakeFiles/fbgemm_gpu_py.dir/src/sparse_ops/sparse_ops_cpu.cpp.o.d -o CMakeFiles/fbgemm_gpu_py.dir/src/sparse_ops/sparse_ops_cpu.cpp.o -c /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = double; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/Cellar/gcc/13.2.0/include/c++/13/algorithm:60,
                 from /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:9:
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/Cellar/gcc/13.2.0/include/c++/13/algorithm:61:
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = double; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = float; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = float; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::Half; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::Half; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::BFloat16; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::BFloat16; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1803:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1745:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1745 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = double; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = double; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = float; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = float; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::Half; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::Half; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::BFloat16; SegmentValueType = int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp: In instantiation of 'void fbgemm_gpu::_generic_histogram_binning_calibration_by_feature_cpu_kernel(int64_t, int64_t, int64_t, double, int64_t, double, const LogitType*, const SegmentValueType*, const double*, const double*, const double*, LogitType*, int64_t*) [with LogitType = c10::BFloat16; SegmentValueType = long long int; int64_t = long long int]':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1922:3:   required from here
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: error: no matching function for call to 'max(long int, long long int)'
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/sparse_ops/sparse_ops_cpu.cpp:1864:19: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
 1864 |         : std::max(0L, dense_segment_value_data[i] * num_bins);
      |           ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[5/35] Building CXX object CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_bounds_check_host_cpu.cpp.o
FAILED: CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_bounds_check_host_cpu.cpp.o
/usr/local/Cellar/gcc/13.2.0/bin/g++-13 -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_py_EXPORTS -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/asmjit/src -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/cpuinfo/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -D_GLIBCXX_USE_CXX11_ABI=0 -DNO_AVX512=1 -O3 -DNDEBUG -std=c++17 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk -mmacosx-version-min=13.6 -fPIC -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_bounds_check_host_cpu.cpp.o -MF CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_bounds_check_host_cpu.cpp.o.d -o CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_bounds_check_host_cpu.cpp.o -c /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp: In function 'void {anonymous}::adjust_offset_cpu(index_t&, index_t&, int64_t, index_t*, index_t*)':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp:33:15: error: no matching function for call to 'max(long int, const long long int&)'
   33 |       std::max(0L, std::min(static_cast<int64_t>(indices_start), num_indices));
      |       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/Cellar/gcc/13.2.0/include/c++/13/deque:62,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/ATen/core/Generator.h:4,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/ATen/CPUGeneratorImpl.h:3,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/ATen/Context.h:3,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/ATen/ATen.h:7,
                 from /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp:9:
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:257:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp:33:15: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
   33 |       std::max(0L, std::min(static_cast<int64_t>(indices_start), num_indices));
      |       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algobase.h:303:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp:33:15: note:   deduced conflicting types for parameter 'const _Tp' ('long int' and 'long long int')
   33 |       std::max(0L, std::min(static_cast<int64_t>(indices_start), num_indices));
      |       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/Cellar/gcc/13.2.0/include/c++/13/functional:67,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/c10/util/C++17.h:7,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/c10/util/string_view.h:4,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/c10/util/StringUtil.h:6,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/c10/util/Exception.h:5,
                 from /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/ATen/core/Generator.h:11:
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5795 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5795:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp:33:15: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
   33 |       std::max(0L, std::min(static_cast<int64_t>(indices_start), num_indices));
      |       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5805 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/local/Cellar/gcc/13.2.0/include/c++/13/bits/stl_algo.h:5805:5: note:   template argument deduction/substitution failed:
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/codegen/embedding_bounds_check_host_cpu.cpp:33:15: note:   mismatched types 'std::initializer_list<_Tp>' and 'long int'
   33 |       std::max(0L, std::min(static_cast<int64_t>(indices_start), num_indices));
      |       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[6/35] Building CXX object CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp.o
FAILED: CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp.o
/usr/local/Cellar/gcc/13.2.0/bin/g++-13 -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_py_EXPORTS -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../include -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/asmjit/src -I/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/../third_party/cpuinfo/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include -isystem /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -D_GLIBCXX_USE_CXX11_ABI=0 -DNO_AVX512=1 -O3 -DNDEBUG -std=c++17 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk -mmacosx-version-min=13.6 -fPIC -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp.o -MF CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp.o.d -o CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp.o -c /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp: In function 'void TORCH_LIBRARY_FRAGMENT_init_fbgemm_2(torch::Library&)':
/Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp:1638:5: error: 'class torch::Library' has no member named 'impl_abstract_pystub'
 1638 |   m.impl_abstract_pystub(
      |     ^~~~~~~~~~~~~~~~~~~~
[9/35] Building CXX object CMakeFiles/fbgemm_gpu_py.dir/codegen/batch_index_select_dim0_cpu_host.cpp.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/skbuild/setuptools_wrap.py", line 674, in setup
    cmkr.make(make_args, install_target=cmake_install_target, env=env)
  File "/Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/skbuild/cmaker.py", line 697, in make
    self.make_impl(clargs=clargs, config=config, source_dir=source_dir, install_target=install_target, env=env)
  File "/Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/skbuild/cmaker.py", line 742, in make_impl
    raise SKBuildError(msg)

An error occurred while building with CMake.
  Command:
    /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/cmake/data/bin/cmake --build . --target install --config Release --
  Install target:
    install
  Source directory:
    /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu
  Working directory:
    /Users/xshan/pubrepo/fbgemm_v0.5.0/fbgemm_gpu/_skbuild/macosx-13.6-x86_64-3.10/cmake-build
Please check the install target is valid and see CMake's output for more information.

@q10
Copy link
Contributor

q10 commented Jan 5, 2024

@justin8shan There may be issues with gcc 12+ similar to this one, could you try installing a lower version of gcc and see if the errors reproduce?

@justin8shan
Copy link
Author

@justin8shan There may be issues with gcc 12+ similar to this one, could you try installing a lower version of gcc and see if the errors reproduce?

Unfortunately I tried all version from gcc 8 to 13 and got same error.

@q10
Copy link
Contributor

q10 commented Jan 6, 2024

@justin8shan Could you show us the instructions you ran (e.g. specific versions of software installed, etc) to build fbgemm_gpu on the Mac? We will use it as reference and add it into our documentation.

The invokers file is actually autogenerated from a template file during the build process, so I imagine that something is still not quite correct in the build...

@justin8shan
Copy link
Author

justin8shan commented Jan 6, 2024

@q10 Sorry it was a false statement that I was able to build package as I did that by removing CMakeList.txt file. Once I build with existing CMakeList.txt file. I still get similar error as above

I manually fixed the mismatch type issue by forcing type conversion like below

Before :
      std::max(0L, std::min(static_cast<int64_t>(indices_start), num_indices));

After:
      std::max(0L, long(std::min(static_cast<int64_t>(indices_start), num_indices)));

After build again I got issue at the final step of compiling, do you know how to fix it?

(torchrec) ➜  fbgemm_gpu git:(v0.5.0-release) ✗ python setup.py bdist_wheel \
    --package_name="${package_name}" \
    --package_variant=cpu \
    --python-tag="${python_tag}" \
    --plat-name="macos-13.6-${ARCH}" \
    -DCMAKE_C_COMPILER="/usr/local/Cellar/gcc/13.2.0/bin/gcc-13" \
    -DCMAKE_CXX_COMPILER="/usr/local/Cellar/gcc/13.2.0/bin/g++-13"
['setup.py', 'bdist_wheel', '--package_name=fbgemm_gpu_cpu', '--package_variant=cpu', '--python-tag=py310', '--plat-name=macos-13.6-x86_64', '-DCMAKE_C_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/gcc-13', '-DCMAKE_CXX_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/g++-13']
[SETUP.PY] Parsed Arguments: Namespace(package_variant='cpu', package_name='fbgemm_gpu_cpu', nvml_lib_path=None)
[SETUP.PY] Unknown Arguments: ['bdist_wheel', '--python-tag=py310', '--plat-name=macos-13.6-x86_64', '-DCMAKE_C_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/gcc-13', '-DCMAKE_CXX_COMPILER=/usr/local/Cellar/gcc/13.2.0/bin/g++-13']
[SETUP.PY] Extracted the package name: 'fbgemm_gpu_cpu'
[SETUP.PY] Not building FBGEMM_GPU from Nova.
[SETUP.PY] Extracted the package variant+version: ''
[SETUP.PY] Generating the package version ...
[SETUP.PY] Package is for RELEASE: using git info for the versioning
[SETUP.PY] TAG: v0.5.0, BRANCH: v0.5.0-release, SHA: b6ed54a2ec9757a159d5a4aec7c8a9b16c16c222
[SETUP.PY] Setting the full package version string: 0.5.0
[SETUP.PY] Generating version file at: /Users/xshan/pubrepo/fbgemm/fbgemm_gpu/fbgemm_gpu/_fbgemm_gpu_version.py
macos-13.6-x86_64
[0/1] Re-running CMake...
================================================================================
Building the CPU-only variant of FBGEMM-GPU
================================================================================

================================================================================
Default C++ compiler flags
(values may be overridden by CMAKE_CXX_STANDARD and CXX_STANDARD):

 -D_GLIBCXX_USE_CXX11_ABI=0
================================================================================

================================================================================
The project is built using scikit-build
================================================================================

CMake Warning at /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  CMakeLists.txt:113 (find_package)


-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/xshan/pubrepo/fbgemm/fbgemm_gpu/_skbuild/macosx-13.6-x86_64-3.10/cmake-build
[2/111] Generating gen_embedding_forward_quantized_unweighted_co...okup_approx_rowwise_adagrad_with_weight_decay.py, lookup_none.py
[Backward Split] [dense]: gen_embedding_backward_dense_split_weighted_cuda.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_unweighted_nobag_cuda.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_unweighted_cuda.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_weighted_kernel_cta.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_unweighted_kernel_cta.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_weighted_kernel_warp.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_unweighted_kernel_warp.cu
[Backward Split] [dense]: gen_embedding_backward_dense_split_cpu.cpp
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp32_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp16_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp8_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int8_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int4_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int2_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp32_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp16_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp8_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int8_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int4_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int2_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp32_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp16_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp8_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int8_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int4_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int2_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_host_weighted_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_host_unweighted_nobag_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_split_nbit_host_unweighted_codegen_cuda.cu
[Forward Quantized]: gen_embedding_forward_quantized_weighted_codegen_cpu.cpp
[Forward Quantized]: gen_embedding_forward_quantized_unweighted_codegen_cpu.cpp
[Forward Split]: gen_embedding_forward_dense_weighted_codegen_cuda.cu
[Forward Split]: gen_embedding_forward_dense_unweighted_codegen_cuda.cu
[Forward Split]: gen_embedding_forward_split_weighted_vbe_codegen_cuda.cu
[Forward Split]: gen_embedding_forward_split_weighted_codegen_cuda.cu
[Forward Split]: gen_embedding_forward_split_unweighted_vbe_codegen_cuda.cu
[Forward Split]: gen_embedding_forward_split_unweighted_codegen_cuda.cu
[Forward Split]: gen_embedding_forward_dense_weighted_kernel.cu
[Forward Split]: gen_embedding_forward_dense_unweighted_nobag_kernel.cu
[Forward Split]: gen_embedding_forward_dense_unweighted_kernel.cu
[Forward Split]: gen_embedding_forward_split_weighted_vbe_kernel.cu
[Forward Split]: gen_embedding_forward_split_weighted_kernel.cu
[Forward Split]: gen_embedding_forward_split_unweighted_nobag_kernel.cu
[Forward Split]: gen_embedding_forward_split_unweighted_vbe_kernel.cu
[Forward Split]: gen_embedding_forward_split_unweighted_kernel.cu
[Forward Split]: gen_embedding_forward_split_weighted_v2_kernel.cu
[Forward Split]: gen_embedding_forward_split_unweighted_v2_kernel.cu
[Forward Split]: gen_embedding_forward_dense_unweighted_nobag_kernel_small.cu
[Forward Split]: gen_embedding_forward_split_unweighted_nobag_kernel_small.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_weighted_cuda.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_unweighted_nobag_cuda.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_unweighted_cuda.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_weighted_kernel_cta.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_unweighted_kernel_cta.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_weighted_kernel_warp.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_unweighted_kernel_warp.cu
[Backward Split] [adagrad]: gen_embedding_backward_split_adagrad.cpp
[Backward Split] [adagrad]: lookup_adagrad.py
[Backward Split] [adagrad]: gen_embedding_backward_adagrad_split_cpu.cpp
[Backward Split] [adagrad]: gen_embedding_backward_split_adagrad_cpu.cpp
[Backward Split] [adam]: gen_embedding_backward_adam_split_weighted_cuda.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_unweighted_nobag_cuda.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_unweighted_cuda.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_weighted_kernel_cta.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_unweighted_kernel_cta.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_weighted_kernel_warp.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [adam]: gen_embedding_backward_adam_split_unweighted_kernel_warp.cu
[Backward Split] [adam]: gen_embedding_backward_split_adam.cpp
[Backward Split] [adam]: lookup_adam.py
[Backward Split] [adam]: gen_embedding_backward_split_adam_cpu.cpp
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_weighted_cuda.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_unweighted_nobag_cuda.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_unweighted_cuda.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_weighted_kernel_cta.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_unweighted_kernel_cta.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_weighted_kernel_warp.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [lamb]: gen_embedding_backward_lamb_split_unweighted_kernel_warp.cu
[Backward Split] [lamb]: gen_embedding_backward_split_lamb.cpp
[Backward Split] [lamb]: lookup_lamb.py
[Backward Split] [lamb]: gen_embedding_backward_split_lamb_cpu.cpp
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_weighted_cuda.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_unweighted_nobag_cuda.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_unweighted_cuda.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_weighted_kernel_cta.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_unweighted_kernel_cta.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_weighted_kernel_warp.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_lars_sgd_split_unweighted_kernel_warp.cu
[Backward Split] [lars_sgd]: gen_embedding_backward_split_lars_sgd.cpp
[Backward Split] [lars_sgd]: lookup_lars_sgd.py
[Backward Split] [lars_sgd]: gen_embedding_backward_split_lars_sgd_cpu.cpp
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_weighted_cuda.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_cuda.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_unweighted_cuda.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_weighted_kernel_cta.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_unweighted_kernel_cta.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_weighted_kernel_warp.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_partial_rowwise_adam_split_unweighted_kernel_warp.cu
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_split_partial_rowwise_adam.cpp
[Backward Split] [partial_rowwise_adam]: lookup_partial_rowwise_adam.py
[Backward Split] [partial_rowwise_adam]: gen_embedding_backward_split_partial_rowwise_adam_cpu.cpp
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_weighted_cuda.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_cuda.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_cuda.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_weighted_kernel_cta.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_kernel_cta.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_weighted_kernel_warp.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_kernel_warp.cu
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_split_partial_rowwise_lamb.cpp
[Backward Split] [partial_rowwise_lamb]: lookup_partial_rowwise_lamb.py
[Backward Split] [partial_rowwise_lamb]: gen_embedding_backward_split_partial_rowwise_lamb_cpu.cpp
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_cuda.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_weighted_cuda.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_cuda.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_cuda.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_cuda.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_kernel_cta.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_weighted_kernel_cta.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_kernel_cta.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_kernel_cta.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_kernel_warp.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_weighted_kernel_warp.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_kernel_warp.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_unweighted_kernel_warp.cu
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_split_rowwise_adagrad.cpp
[Backward Split] [rowwise_adagrad]: lookup_rowwise_adagrad.py
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_rowwise_adagrad_split_cpu.cpp
[Backward Split] [rowwise_adagrad]: gen_embedding_backward_split_rowwise_adagrad_cpu.cpp
[Backward Split] [approx_rowwise_adagrad]: gen_embedding_backward_split_approx_rowwise_adagrad.cpp
[Backward Split] [approx_rowwise_adagrad]: gen_embedding_backward_split_approx_rowwise_adagrad_cpu.cpp
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_weighted_cuda.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_unweighted_nobag_cuda.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_unweighted_cuda.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_weighted_kernel_cta.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_unweighted_kernel_cta.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_weighted_kernel_warp.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_rowwise_adagrad_with_weight_decay_split_unweighted_kernel_warp.cu
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay.cpp
[Backward Split] [rowwise_adagrad_with_weight_decay]: lookup_rowwise_adagrad_with_weight_decay.py
[Backward Split] [rowwise_adagrad_with_weight_decay]: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_cpu.cpp
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_weighted_cuda.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_unweighted_nobag_cuda.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_unweighted_cuda.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_weighted_kernel_cta.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_unweighted_kernel_cta.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_weighted_kernel_warp.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_approx_rowwise_adagrad_with_weight_decay_split_unweighted_kernel_warp.cu
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay.cpp
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: lookup_approx_rowwise_adagrad_with_weight_decay.py
[Backward Split] [approx_rowwise_adagrad_with_weight_decay]: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_cpu.cpp
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_cuda.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_cuda.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_cuda.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_kernel_cta.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_kernel_cta.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_kernel_warp.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_kernel_warp.cu
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_split_rowwise_adagrad_with_counter.cpp
[Backward Split] [rowwise_adagrad_with_counter]: lookup_rowwise_adagrad_with_counter.py
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_rowwise_adagrad_with_counter_split_cpu.cpp
[Backward Split] [rowwise_adagrad_with_counter]: gen_embedding_backward_split_rowwise_adagrad_with_counter_cpu.cpp
[Backward Split] [approx_rowwise_adagrad_with_counter]: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter.cpp
[Backward Split] [approx_rowwise_adagrad_with_counter]: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_cpu.cpp
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_weighted_cuda.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_unweighted_nobag_cuda.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_unweighted_cuda.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_weighted_kernel_cta.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_unweighted_kernel_cta.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_weighted_kernel_warp.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_unweighted_kernel_warp.cu
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_split_rowwise_weighted_adagrad.cpp
[Backward Split] [rowwise_weighted_adagrad]: lookup_rowwise_weighted_adagrad.py
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_rowwise_weighted_adagrad_split_cpu.cpp
[Backward Split] [rowwise_weighted_adagrad]: gen_embedding_backward_split_rowwise_weighted_adagrad_cpu.cpp
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_weighted_vbe_cuda.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_weighted_cuda.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_nobag_cuda.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_vbe_cuda.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_cuda.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_weighted_vbe_kernel_cta.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_weighted_kernel_cta.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_vbe_kernel_cta.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_kernel_cta.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_weighted_vbe_kernel_warp.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_weighted_kernel_warp.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_vbe_kernel_warp.cu
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_unweighted_kernel_warp.cu
[Backward Split] [sgd]: gen_embedding_backward_split_sgd.cpp
[Backward Split] [sgd]: lookup_sgd.py
[Backward Split] [sgd]: gen_embedding_backward_sgd_split_cpu.cpp
[Backward Split] [sgd]: gen_embedding_backward_split_sgd_cpu.cpp
[Backward Split] [approx_sgd]: gen_embedding_backward_split_approx_sgd.cpp
[Backward Split] [approx_sgd]: gen_embedding_backward_split_approx_sgd_cpu.cpp
[Backward Split] [none]: gen_embedding_backward_none_split_weighted_cuda.cu
[Backward Split] [none]: gen_embedding_backward_none_split_unweighted_nobag_cuda.cu
[Backward Split] [none]: gen_embedding_backward_none_split_unweighted_cuda.cu
[Backward Split] [none]: gen_embedding_backward_none_split_weighted_kernel_cta.cu
[Backward Split] [none]: gen_embedding_backward_none_split_unweighted_nobag_kernel_cta.cu
[Backward Split] [none]: gen_embedding_backward_none_split_unweighted_kernel_cta.cu
[Backward Split] [none]: gen_embedding_backward_none_split_weighted_kernel_warp.cu
[Backward Split] [none]: gen_embedding_backward_none_split_unweighted_nobag_kernel_warp.cu
[Backward Split] [none]: gen_embedding_backward_none_split_unweighted_kernel_warp.cu
[Backward Split] [none]: gen_embedding_backward_split_none.cpp
[Backward Split] [none]: lookup_none.py
[Backward Split] [none]: gen_embedding_backward_split_none_cpu.cpp
[110/111] Linking CXX shared module fbgemm_gpu_py.so
FAILED: fbgemm_gpu_py.so
: && /usr/local/Cellar/gcc/13.2.0/bin/g++-13 -D_GLIBCXX_USE_CXX11_ABI=0 -DNO_AVX512=1 -O3 -DNDEBUG -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk -mmacosx-version-min=13.6 -bundle -Wl,-headerpad_max_install_names -s -o fbgemm_gpu_py.so CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64assembler.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64builder.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64compiler.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64emithelper.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64formatter.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64func.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64instapi.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64instdb.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64operand.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/a64rapass.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/arm/armformatter.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/archtraits.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/assembler.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/builder.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/codeholder.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/codewriter.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/compiler.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/constpool.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/cpuinfo.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/emithelper.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/emitter.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/emitterutils.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/environment.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/errorhandler.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/formatter.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/func.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/funcargscontext.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/globals.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/inst.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/jitallocator.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/jitruntime.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/logger.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/operand.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/osutils.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/ralocal.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/rapass.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/rastack.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/string.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/support.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/target.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/type.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/virtmem.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/zone.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/zonehash.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/zonelist.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/zonestack.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/zonetree.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/core/zonevector.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86assembler.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86builder.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86compiler.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86emithelper.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86formatter.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86func.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86instapi.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86instdb.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86operand.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/third_party/asmjit/src/asmjit/x86/x86rapass.cpp.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/EmbeddingSpMDM.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/EmbeddingSpMDMNBit.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/QuantUtils.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/RefImplementations.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/RowWiseSparseAdagradFused.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/SparseAdagrad.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/Utils.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/EmbeddingSpMDMAvx2.cc.o CMakeFiles/fbgemm_gpu_py.dir/Users/xshan/pubrepo/fbgemm/src/QuantUtilsAvx2.cc.o CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_forward_split_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_forward_quantized_host_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_backward_dense_host_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/codegen/embedding_bounds_check_host_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/permute_pooled_embedding_ops/permute_pooled_embedding_ops_split_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_autograd.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_meta.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/input_combine_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/layout_transform_ops_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/quantize_ops/quantize_ops_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/quantize_ops/quantize_ops_meta.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/sparse_ops/sparse_ops_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/sparse_ops/sparse_ops_meta.cpp.o CMakeFiles/fbgemm_gpu_py.dir/src/embedding_inplace_update_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/codegen/batch_index_select_dim0_cpu_host.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_forward_quantized_unweighted_codegen_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_forward_quantized_weighted_codegen_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_dense_split_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_adagrad_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_rowwise_adagrad_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_rowwise_adagrad_with_counter_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_rowwise_weighted_adagrad_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_sgd_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_adam_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_lamb_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_partial_rowwise_adam_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_partial_rowwise_lamb_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_lars_sgd_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_none_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_approx_sgd_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_approx_rowwise_adagrad_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_adagrad_split_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_rowwise_adagrad_split_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_rowwise_adagrad_with_counter_split_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_rowwise_weighted_adagrad_split_cpu.cpp.o CMakeFiles/fbgemm_gpu_py.dir/gen_embedding_backward_sgd_split_cpu.cpp.o -L/lib/intel64   -L/lib/intel64_win   -L/lib/win-x64 -Wl,-rpath,/lib/intel64 -Wl,-rpath,/lib/intel64_win -Wl,-rpath,/lib/win-x64 -Wl,-rpath,/Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/lib  /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/lib/libc10.dylib  /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/lib/libtorch.dylib  /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/lib/libtorch_cpu.dylib  /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/torch/lib/libc10.dylib && :
ld: warning: -s is obsolete
ld: warning: search path '/lib/intel64' not found
ld: warning: search path '/lib/intel64_win' not found
ld: warning: search path '/lib/win-x64' not found
ld: Undefined symbols:
  _GOMP_barrier, referenced from:
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.1 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.1 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      ...
  _GOMP_parallel, referenced from:
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      ...
  __ZN3c1010Dispatcher17runRecordFunctionERN2at14RecordFunctionESt17reference_wrapperIKNS_14FunctionSchemaEENS_11DispatchKeyE, referenced from:
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EERKNS_6SymIntEEEET_RKNS_19TypedOperatorHandleIFSE_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESH_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EERKNS_8ArrayRefINS_6SymIntEEEdEEET_RKNS_19TypedOperatorHandleIFSG_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESJ_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EES5_S5_EEET_RKNS_19TypedOperatorHandleIFSB_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESE_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathISt5tupleIJN2at6TensorES4_EEJRKS4_RKSt6vectorIS4_SaIS4_EES7_S7_EEET_RKNS_19TypedOperatorHandleIFSD_DpT0_EEERNS3_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESG_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EES5_EEET_RKNS_19TypedOperatorHandleIFSB_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESE_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathISt5tupleIJN2at6TensorES4_EEJRKS4_S7_S7_S7_EEET_RKNS_19TypedOperatorHandleIFS8_DpT0_EEERNS3_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESB_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_S5_S5_EEET_RKNS_19TypedOperatorHandleIFS6_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES9_ in jagged_tensor_ops_autograd.cpp.o
      ...
  __ZN3c1010Dispatcher17runRecordFunctionERN2at14RecordFunctionESt17reference_wrapperIKNS_14FunctionSchemaEENS_11DispatchKeyENS_8ArrayRefIKNS_6IValueEEE, referenced from:
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EERKNS_6SymIntEEEET_RKNS_19TypedOperatorHandleIFSE_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESH_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EERKNS_8ArrayRefINS_6SymIntEEEdEEET_RKNS_19TypedOperatorHandleIFSG_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESJ_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EES5_S5_EEET_RKNS_19TypedOperatorHandleIFSB_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESE_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathISt5tupleIJN2at6TensorES4_EEJRKS4_RKSt6vectorIS4_SaIS4_EES7_S7_EEET_RKNS_19TypedOperatorHandleIFSD_DpT0_EEERNS3_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESG_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_RKSt6vectorIS3_SaIS3_EES5_EEET_RKNS_19TypedOperatorHandleIFSB_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESE_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathISt5tupleIJN2at6TensorES4_EEJRKS4_S7_S7_S7_EEET_RKNS_19TypedOperatorHandleIFS8_DpT0_EEERNS3_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionESB_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJRKS3_S5_S5_EEET_RKNS_19TypedOperatorHandleIFS6_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES9_ in jagged_tensor_ops_autograd.cpp.o
      ...
  __ZN3c1010TensorImpl17set_autograd_metaESt10unique_ptrINS_21AutogradMetaInterfaceESt14default_deleteIS2_EE, referenced from:
      __ZN5torch8autograd13make_variableEN2at6TensorEbb in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch8autograd13make_variableEN2at6TensorEbb in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch8autograd13make_variableEN2at6TensorEbb in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch8autograd13make_variableEN2at6TensorEbb in embedding_forward_quantized_host_cpu.cpp.o
  __ZN3c1022getCustomClassTypeImplERKSt10type_index, referenced from:
      __ZN3c1018getFakeTypePtrCopyINS_13intrusive_ptrI11TensorQueueNS_6detail34intrusive_target_default_null_typeIS2_EEEEEENS_4Type24SingletonOrSharedTypePtrIS7_EEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1014getTypePtrCopyINS_14tagged_capsuleI11TensorQueueEEEENS_4Type24SingletonOrSharedTypePtrIS4_EEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1018getFakeTypePtrCopyINS_13intrusive_ptrI13AtomicCounterNS_6detail34intrusive_target_default_null_typeIS2_EEEEEENS_4Type24SingletonOrSharedTypePtrIS7_EEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1018getFakeTypePtrCopyINS_14tagged_capsuleI13AtomicCounterEEEENS_4Type24SingletonOrSharedTypePtrIS4_EEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1018getFakeTypePtrCopyINS_13intrusive_ptrI12PrunedMapCPUNS_6detail34intrusive_target_default_null_typeIS2_EEEEEENS_4Type24SingletonOrSharedTypePtrIS7_EEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1014getTypePtrCopyINS_14tagged_capsuleI12PrunedMapCPUEEEENS_4Type24SingletonOrSharedTypePtrIS4_EEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1014getTypePtrCopyINS_13intrusive_ptrI13AtomicCounterNS_6detail34intrusive_target_default_null_typeIS2_EEEEEENS_4Type24SingletonOrSharedTypePtrIS7_EEv in embedding_forward_quantized_host_cpu.cpp.o
      ...
  __ZN3c105ErrorC2ENS_14SourceLocationESs, referenced from:
      __ZN3c106ivalue6Future20getDevicesOfStoragesERKNS_4impl16VirtualGuardImplERKSt6vectorINS_18weak_intrusive_ptrINS_11StorageImplENS_6detail34intrusive_target_default_null_typeIS8_EEEESaISC_EE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1013intrusive_ptrINS_6ivalue6FutureENS_6detail34intrusive_target_default_null_typeIS2_EEE4makeIJNS_4Type24SingletonOrSharedTypePtrIS8_EEEEES6_DpOT_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106ivalue6Future14invokeCallbackISt8functionIFvRS1_EEEEvT_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106ivalue6Future13markCompletedENS_6IValueENS_8optionalISt6vectorINS_18weak_intrusive_ptrINS_11StorageImplENS_6detail34intrusive_target_default_null_typeIS6_EEEESaISA_EEEE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN10fbgemm_gpu28dense_to_jagged_forward_metaERKN2at6TensorERKSt6vectorIS1_SaIS1_EERKN3c108optionalINS9_6SymIntEEE.cold in jagged_tensor_ops_meta.cpp.o
  __ZN3c106detail12infer_schema20make_function_schemaEOSsS2_NS_8ArrayRefINS1_11ArgumentDefEEES5_, referenced from:
      __ZN5torch6class_I13AtomicCounterE12defineMethodINS_6detail10WrapMethodIMS1_FxvEEEEEPNS_3jit8FunctionESsT_SsSt16initializer_listINS_3argEE.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE12defineMethodINS_6detail10WrapMethodIMS1_FN2at6TensorEvEEEEEPNS_3jit8FunctionESsT_SsSt16initializer_listINS_3argEE.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE12defineMethodIZNS2_3defIJN2at6TensorEEEERS2_NS_6detail5typesIvJDpT_EEESsSt16initializer_listINS_3argEEEUlN3c1014tagged_capsuleIS1_EES6_E_EEPNS_3jit8FunctionESsT_SsSF_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I13AtomicCounterE12defineMethodIZNS2_3defIJEEERS2_NS_6detail5typesIvJDpT_EEESsSt16initializer_listINS_3argEEEUlN3c1014tagged_capsuleIS1_EEE_EEPNS_3jit8FunctionESsT_SsSD_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I12PrunedMapCPUE12defineMethodIZNS2_3defIJEEERS2_NS_6detail5typesIvJDpT_EEESsSt16initializer_listINS_3argEEEUlN3c1014tagged_capsuleIS1_EEE_EEPNS_3jit8FunctionESsT_SsSD_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE10def_pickleINL19TensorQueueRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlNS5_4DictISsN2at6TensorEEEE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE10def_pickleINL19TensorQueueRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlNS5_4DictISsN2at6TensorEEEE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      ...
  __ZN3c106detail14torchCheckFailEPKcS2_jRKSs, referenced from:
      __ZN2at5emptyEN3c108ArrayRefIxEENS0_13TensorOptionsENS0_8optionalINS0_12MemoryFormatEEE in embedding_forward_split_cpu.cpp.o
      __ZN10fbgemm_gpu22report_embedding_errorIxEEviiiiPKT_S3_xb.constprop.0 in embedding_forward_split_cpu.cpp.o
      __ZNKR2at10TensorBase8accessorIiLm1EEENS_14TensorAccessorIT_XT0_ENS_16DefaultPtrTraitsExEEv in embedding_forward_split_cpu.cpp.o
      __ZNKR2at10TensorBase8accessorIxLm1EEENS_14TensorAccessorIT_XT0_ENS_16DefaultPtrTraitsExEEv in embedding_forward_split_cpu.cpp.o
      __ZZZZ35split_embedding_codegen_forward_cpuN2at6TensorES0_S0_xS0_S0_S0_xS0_xENKUlvE_clEvENKUlvE0_clEvENKUlvE_clEv in embedding_forward_split_cpu.cpp.o
      __ZZZZ35split_embedding_codegen_forward_cpuN2at6TensorES0_S0_xS0_S0_S0_xS0_xENKUlvE_clEvENKUlvE_clEvENKUlvE_clEv in embedding_forward_split_cpu.cpp.o
      __Z35split_embedding_codegen_forward_cpuN2at6TensorES0_S0_xS0_S0_S0_xS0_x in embedding_forward_split_cpu.cpp.o
      __Z35split_embedding_codegen_forward_cpuN2at6TensorES0_S0_xS0_S0_S0_xS0_x in embedding_forward_split_cpu.cpp.o
      ...
  __ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs, referenced from:
      __ZNKR3c106IValue8toObjectEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1013QualifiedNameC1ERKSs in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1013QualifiedNameC1ERKSs in embedding_forward_quantized_host_cpu.cpp.o
      __ZNK5torch6detail19TensorDataContainer11fill_tensorERN2at6TensorE in embedding_forward_quantized_host_cpu.cpp.o
      __ZNK5torch6detail19TensorDataContainer11fill_tensorERN2at6TensorE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6detail32call_torchbind_method_from_stackIZNS_6class_I13AtomicCounterE10def_pickleINL21AtomicCounterRegistryMUlRKN3c1013intrusive_ptrIS3_NS7_6detail34intrusive_target_default_null_typeIS3_EEEEE_ENS6_UlSsE_EEERS4_OT_OT0_EUlNS7_14tagged_capsuleIS3_EEOSsE_Lb0EJLm0ELm1EEEENS7_4guts23infer_function_traits_t11return_typeERSI_RSt6vectorINS7_6IValueESaISV_EESt16integer_sequenceImJXspT1_EEE.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZNSt17_Function_handlerIFvRSt6vectorIN3c106IValueESaIS2_EEEZN5torch6class_I12PrunedMapCPUE12defineMethodIZNSA_10def_pickleINL20PrunedMapCPURegistryMUlRKNS1_13intrusive_ptrIS9_NS1_6detail34intrusive_target_default_null_typeIS9_EEEEE_ENSD_UlSsE_EEERSA_OT_OT0_EUlNS1_14tagged_capsuleIS9_EEOSsE_EEPNS7_3jit8FunctionESsSO_SsSt16initializer_listINS7_3argEEEUlS5_E_E9_M_invokeERKSt9_Any_dataS5_ in embedding_forward_quantized_host_cpu.cpp.o
      ...
  __ZN3c106detail8ListImplC1ESt6vectorINS_6IValueESaIS3_EENS_4Type24SingletonOrSharedTypePtrIS6_EE, referenced from:
      __ZN3c104ListINS_6SymIntEEC1Ev in jagged_tensor_ops_autograd.cpp.o
      __ZN3c104ListIxEC1Ev in jagged_tensor_ops_autograd.cpp.o
      __ZN3c104ListIN2at6TensorEEC1Ev in jagged_tensor_ops_autograd.cpp.o
  __ZN3c106ivalue14ConstantString6createESs, referenced from:
      __ZNK3c104DictISsN2at6TensorEE2atERKSs.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZNK3c104DictISsN2at6TensorEE6insertISsRKS2_EESt4pairINS_4impl12DictIteratorISsS2_N11ska_ordered8detailv317sherwood_v3_tableIS7_INS_6IValueESD_ESD_NS_6detail11DictKeyHashENSB_16KeyOrValueHasherISD_SE_SG_EENSF_14DictKeyEqualToENSB_18KeyOrValueEqualityISD_SE_SJ_EESaISE_ESaINSB_17sherwood_v3_entryISE_EEEE18templated_iteratorISE_EEEEbEOT_OT0_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZNK11TensorQueue9serializeEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZNSt17_Function_handlerIFvRSt6vectorIN3c106IValueESaIS2_EEEZN5torch6class_I12PrunedMapCPUE12defineMethodINL20PrunedMapCPURegistryMUlRKNS1_13intrusive_ptrIS9_NS1_6detail34intrusive_target_default_null_typeIS9_EEEEE_EEEPNS7_3jit8FunctionESsT_SsSt16initializer_listINS7_3argEEEUlS5_E_E9_M_invokeERKSt9_Any_dataS5_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZNSt17_Function_handlerIFvRSt6vectorIN3c106IValueESaIS2_EEEZN5torch6class_I13AtomicCounterE12defineMethodINL21AtomicCounterRegistryMUlRKNS1_13intrusive_ptrIS9_NS1_6detail34intrusive_target_default_null_typeIS9_EEEEE_EEEPNS7_3jit8FunctionESsT_SsSt16initializer_listINS7_3argEEEUlS5_E_E9_M_invokeERKSt9_Any_dataS5_ in embedding_forward_quantized_host_cpu.cpp.o
  __ZN3c108DictType3getESsNS_4Type24SingletonOrSharedTypePtrIS1_EES3_, referenced from:
      __ZN3c1014getTypePtrCopyINS_4DictISsN2at6TensorEEEEENS_4Type24SingletonOrSharedTypePtrIS5_EEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1018getFakeTypePtrCopyINS_4DictISsN2at6TensorEEEEENS_4Type24SingletonOrSharedTypePtrIS5_EEv in embedding_forward_quantized_host_cpu.cpp.o
  __ZN3c108ListType3getESsNS_4Type24SingletonOrSharedTypePtrIS1_EE, referenced from:
      __ZN3c1014getTypePtrCopyISt6vectorIN2at6TensorESaIS3_EEEENS_4Type24SingletonOrSharedTypePtrIS6_EEv in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1018getFakeTypePtrCopyISt6vectorIN2at6TensorESaIS3_EEEENS_4Type24SingletonOrSharedTypePtrIS6_EEv in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1014getTypePtrCopyISt6vectorIxSaIxEEEENS_4Type24SingletonOrSharedTypePtrIS4_EEv in jagged_tensor_ops_cpu.cpp.o
      __ZN3c1018getFakeTypePtrCopyISt6vectorIxSaIxEEEENS_4Type24SingletonOrSharedTypePtrIS4_EEv in jagged_tensor_ops_cpu.cpp.o
      __ZN3c1014getTypePtrCopyINS_8optionalISt6vectorIxSaIxEEEEEENS_4Type24SingletonOrSharedTypePtrIS6_EEv in sparse_ops_cpu.cpp.o
      __ZN3c1018getFakeTypePtrCopyINS_8optionalISt6vectorIxSaIxEEEEEENS_4Type24SingletonOrSharedTypePtrIS6_EEv in sparse_ops_cpu.cpp.o
  __ZN3c10lsERSoNS_10DeviceTypeE, referenced from:
      __ZN3c106detail12_str_wrapperIJPKcRKNS_10DeviceTypeES3_EE4callERKS3_S6_S9_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106ivalue6Future20getDevicesOfStoragesERKNS_4impl16VirtualGuardImplERKSt6vectorINS_18weak_intrusive_ptrINS_11StorageImplENS_6detail34intrusive_target_default_null_typeIS8_EEEESaISC_EE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1013intrusive_ptrINS_6ivalue6FutureENS_6detail34intrusive_target_default_null_typeIS2_EEE4makeIJNS_4Type24SingletonOrSharedTypePtrIS8_EEEEES6_DpOT_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106ivalue6Future13markCompletedENS_6IValueENS_8optionalISt6vectorINS_18weak_intrusive_ptrINS_11StorageImplENS_6detail34intrusive_target_default_null_typeIS6_EEEESaISA_EEEE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106ivalue6Future13markCompletedENS_6IValueENS_8optionalISt6vectorINS_18weak_intrusive_ptrINS_11StorageImplENS_6detail34intrusive_target_default_null_typeIS6_EEEESaISA_EEEE in embedding_forward_quantized_host_cpu.cpp.o
  __ZN3c10lsERSoRKNS_12OperatorNameE, referenced from:
      __ZN3c106detail12_str_wrapperIJPKcRKNS_12OperatorNameES3_EE4callERKS3_S6_S9_ in jagged_tensor_ops_autograd.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIvJN2at6TensorES3_S3_S3_S3_xS3_xS3_S3_xS3_bS3_S3_S3_ddxEEET_RKNS_19TypedOperatorHandleIFS4_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES7_ in gen_embedding_backward_split_adagrad_cpu.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJS3_S3_S3_S3_S3_S3_S3_EEET_RKNS_19TypedOperatorHandleIFS4_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES7_ in gen_embedding_backward_split_adagrad_cpu.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIN2at6TensorEJS3_S3_S3_xS3_S3_S3_xS3_xEEET_RKNS_19TypedOperatorHandleIFS4_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES7_ in gen_embedding_backward_split_adagrad_cpu.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIvJN2at6TensorES3_S3_S3_S3_xS3_xS3_S3_xS3_bS3_S3_S3_dddxdxEEET_RKNS_19TypedOperatorHandleIFS4_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES7_ in gen_embedding_backward_split_rowwise_adagrad_cpu.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIvJN2at6TensorES3_S3_S3_S3_xS3_xS3_S3_xS3_bS3_S3_S3_S3_S3_S3_S3_S3_S3_dddxxxdxxxddxxEEET_RKNS_19TypedOperatorHandleIFS4_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES7_ in gen_embedding_backward_split_rowwise_adagrad_with_counter_cpu.cpp.o
      __ZN3c1010Dispatcher27callWithDispatchKeySlowPathIvJN2at6TensorES3_S3_S3_S3_xS3_xS3_S3_xS3_bS3_S3_S3_dddxxEEET_RKNS_19TypedOperatorHandleIFS4_DpT0_EEERNS2_13StepCallbacksENS_14DispatchKeySetERKNS_14KernelFunctionES7_ in gen_embedding_backward_split_rowwise_weighted_adagrad_cpu.cpp.o
      ...
  __ZN3c10lsERSoRKNS_6DeviceE, referenced from:
      __ZN3c106ivalue6Future18formatSetOfDevicesERKSt6vectorINS_6DeviceESaIS3_EE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106ivalue6Future18formatSetOfDevicesERKSt6vectorINS_6DeviceESaIS3_EE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106detail12_str_wrapperIJPKcRKNS_6DeviceES3_RKmS3_S6_EE4callERKS3_S6_SB_S8_SB_S6_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106detail12_str_wrapperIJPKcRKNS_6DeviceES3_RKmS3_S6_EE4callERKS3_S6_SB_S8_SB_S6_ in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c106ivalue6Future20getDevicesOfStoragesERKNS_4impl16VirtualGuardImplERKSt6vectorINS_18weak_intrusive_ptrINS_11StorageImplENS_6detail34intrusive_target_default_null_typeIS8_EEEESaISC_EE in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c1013intrusive_ptrINS_6ivalue6FutureENS_6detail34intrusive_target_default_null_typeIS2_EEE4makeIJNS_4Type24SingletonOrSharedTypePtrIS8_EEEEES6_DpOT_ in embedding_forward_quantized_host_cpu.cpp.o
  __ZN3c10lsERSoRKNS_6IValueE, referenced from:
      __ZN3c10lsERSoRKNS_8ArgumentE.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN3c10lsERSoRKNS_8ArgumentE.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
  __ZN3c10lsERSoRKNS_6SymIntE, referenced from:
      __ZN3c10lsINS_6SymIntEEERSoS2_NS_8ArrayRefIT_EE in jagged_tensor_ops_meta.cpp.o
      __ZN3c10lsINS_6SymIntEEERSoS2_NS_8ArrayRefIT_EE in jagged_tensor_ops_meta.cpp.o
      __ZN10fbgemm_gpu44batched_dense_vec_jagged_2d_mul_forward_metaERKN2at6TensorES3_S3_ in jagged_tensor_ops_meta.cpp.o
      __ZN10fbgemm_gpu44batched_dense_vec_jagged_2d_mul_forward_metaERKN2at6TensorES3_S3_ in jagged_tensor_ops_meta.cpp.o
  __ZN5torch25registerCustomClassMethodESt10unique_ptrINS_3jit8FunctionESt14default_deleteIS2_EE, referenced from:
      __ZN5torch6class_I13AtomicCounterE12defineMethodINS_6detail10WrapMethodIMS1_FxvEEEEEPNS_3jit8FunctionESsT_SsSt16initializer_listINS_3argEE.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE12defineMethodINS_6detail10WrapMethodIMS1_FN2at6TensorEvEEEEEPNS_3jit8FunctionESsT_SsSt16initializer_listINS_3argEE.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE12defineMethodIZNS2_3defIJN2at6TensorEEEERS2_NS_6detail5typesIvJDpT_EEESsSt16initializer_listINS_3argEEEUlN3c1014tagged_capsuleIS1_EES6_E_EEPNS_3jit8FunctionESsT_SsSF_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I13AtomicCounterE12defineMethodIZNS2_3defIJEEERS2_NS_6detail5typesIvJDpT_EEESsSt16initializer_listINS_3argEEEUlN3c1014tagged_capsuleIS1_EEE_EEPNS_3jit8FunctionESsT_SsSD_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I12PrunedMapCPUE12defineMethodIZNS2_3defIJEEERS2_NS_6detail5typesIvJDpT_EEESsSt16initializer_listINS_3argEEEUlN3c1014tagged_capsuleIS1_EEE_EEPNS_3jit8FunctionESsT_SsSD_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE10def_pickleINL19TensorQueueRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlNS5_4DictISsN2at6TensorEEEE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE10def_pickleINL19TensorQueueRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlNS5_4DictISsN2at6TensorEEEE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      ...
  __ZN5torch3jit11parseSchemaERKSs, referenced from:
      __ZN12_GLOBAL__N_1L36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2ERN5torch7LibraryE in embedding_forward_split_cpu.cpp.o
      __ZN12_GLOBAL__N_1L36TORCH_LIBRARY_FRAGMENT_init_fbgemm_3ERN5torch7LibraryE in embedding_forward_split_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      ...
  __ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6_, referenced from:
      __Z41__static_initialization_and_destruction_0v in embedding_forward_quantized_host_cpu.cpp.o
      __Z41__static_initialization_and_destruction_0v in embedding_forward_quantized_host_cpu.cpp.o
      __Z41__static_initialization_and_destruction_0v in embedding_forward_quantized_host_cpu.cpp.o
  __ZN5torch7LibraryC1ENS0_4KindESsN3c108optionalINS2_11DispatchKeyEEEPKcj, referenced from:
      __ZN5torch6detail16TorchLibraryInitC1ENS_7Library4KindEPFvRS2_EPKcN3c108optionalINS9_11DispatchKeyEEES8_j in embedding_forward_split_cpu.cpp.o
      __Z41__static_initialization_and_destruction_0v in embedding_forward_quantized_host_cpu.cpp.o
      __GLOBAL__sub_I_permute_pooled_embedding_ops_split_cpu.cpp in permute_pooled_embedding_ops_split_cpu.cpp.o
      __GLOBAL__sub_I_jagged_tensor_ops_autograd.cpp in jagged_tensor_ops_autograd.cpp.o
      __GLOBAL__sub_I_jagged_tensor_ops_meta.cpp in jagged_tensor_ops_meta.cpp.o
      __GLOBAL__sub_I_input_combine_cpu.cpp in input_combine_cpu.cpp.o
      __GLOBAL__sub_I_quantize_ops_meta.cpp in quantize_ops_meta.cpp.o
      ...
  __ZN5torch8autograd13_wrap_outputsERKSt6vectorIN2at6TensorESaIS3_EERKSt13unordered_setIPN3c1010TensorImplESt4hashISB_ESt8equal_toISB_ESaISB_EESJ_NS9_8ArrayRefINS9_8optionalIS3_EEEERKSt10shared_ptrINS0_4NodeEESt8functionIFS5_S5_S5_EESJ_, referenced from:
      __ZN3c104impl28wrap_kernel_functor_unboxed_INS0_6detail24WrapFunctionIntoFunctor_INS_26CompileTimeFunctionPointerIFN2at6TensorES6_S6_S6_xxS6_xS6_S6_xNS_8optionalIS6_EES8_xEXadL_ZN12_GLOBAL__N_145split_embedding_codegen_lookup_dense_functionES6_S6_S6_xxS6_xS6_S6_xS8_S8_xEEEES6_NS_4guts8typelist8typelistIJS6_S6_S6_xxS6_xS6_S6_xS8_S8_xEEEEES9_E4callEPNS_14OperatorKernelENS_14DispatchKeySetES6_S6_S6_xxS6_xS6_S6_xS8_S8_x in embedding_backward_dense_host_cpu.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu30PermutePooledEmbsFunctionSplitIXadL_ZNS2_29permute_pooled_embs_split_cpuERKN2at6TensorES7_S7_S7_S7_EEEEE5applyIS8_JS7_S7_S7_S7_S7_EEENSt9enable_ifIXsrSt7is_sameIT_S8_E5valueEDTclsrSD_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSF_ in permute_pooled_embedding_ops_split_cpu.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_121JaggedToPaddedDenseOpEE5applyIS4_JRKN2at6TensorERKSt6vectorIS8_SaIS8_EERKN3c108ArrayRefINSG_6SymIntEEERKdEEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSQ_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSS_ in jagged_tensor_ops_autograd.cpp.o
      __ZN10fbgemm_gpu28jagged_dense_elementwise_addERKN2at6TensorERKSt6vectorIS1_SaIS1_EES3_ in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_133JaggedDenseDenseAddJaggedOutputOpEE5applyIS4_JRKN2at6TensorERKSt6vectorIS8_SaIS8_EESA_SA_EEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSI_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSK_ in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_116JaggedDenseMulOpEE5applyIS4_JRKN2at6TensorERKSt6vectorIS8_SaIS8_EESA_EEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSI_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSK_ in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_128BatchedDenseVecJagged2DMulOpEE5applyIS4_JRKN2at6TensorESA_SA_EEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSD_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSF_ in jagged_tensor_ops_autograd.cpp.o
      ...
  __ZN5torch8autograd15AutogradContext17save_for_backwardESt6vectorIN2at6TensorESaIS4_EE, referenced from:
      __ZN3c104impl28wrap_kernel_functor_unboxed_INS0_6detail24WrapFunctionIntoFunctor_INS_26CompileTimeFunctionPointerIFN2at6TensorES6_S6_S6_xxS6_xS6_S6_xNS_8optionalIS6_EES8_xEXadL_ZN12_GLOBAL__N_145split_embedding_codegen_lookup_dense_functionES6_S6_S6_xxS6_xS6_S6_xS8_S8_xEEEES6_NS_4guts8typelist8typelistIJS6_S6_S6_xxS6_xS6_S6_xS8_S8_xEEEEES9_E4callEPNS_14OperatorKernelENS_14DispatchKeySetES6_S6_S6_xxS6_xS6_S6_xS8_S8_x in embedding_backward_dense_host_cpu.cpp.o
      __ZN10fbgemm_gpu12_GLOBAL__N_121JaggedToPaddedDenseOp7forwardEPN5torch8autograd15AutogradContextERKN2at6TensorERKSt6vectorIS7_SaIS7_EEN3c108ArrayRefINSF_6SymIntEEEd in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_133JaggedDenseDenseAddJaggedOutputOpEE5applyIS4_JRKN2at6TensorERKSt6vectorIS8_SaIS8_EESA_SA_EEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSI_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSK_ in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_116JaggedDenseMulOpEE5applyIS4_JRKN2at6TensorERKSt6vectorIS8_SaIS8_EESA_EEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSI_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSK_ in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_128BatchedDenseVecJagged2DMulOpEE5applyIS4_JRKN2at6TensorESA_SA_EEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSD_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSF_ in jagged_tensor_ops_autograd.cpp.o
      __ZN10fbgemm_gpu12_GLOBAL__N_115DenseToJaggedOp7forwardEPN5torch8autograd15AutogradContextERKN2at6TensorERKSt6vectorIS7_SaIS7_EERKN3c108optionalINSF_6SymIntEEE in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_117JaggedJaggedBmmOpEE5applyIS4_JRKN2at6TensorESA_SA_RKxEEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSF_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSH_ in jagged_tensor_ops_autograd.cpp.o
      ...
  __ZN5torch9serialize12InputArchive4readERKSsRN2at6TensorEb, referenced from:
      __ZN12PrunedMapCPUC1ESs in embedding_forward_quantized_host_cpu.cpp.o
      __ZN12PrunedMapCPUC1ESs in embedding_forward_quantized_host_cpu.cpp.o
  __ZN5torch9serialize13OutputArchive5writeERKSsRKN2at6TensorEb, referenced from:
      __ZNK12PrunedMapCPU9serializeEv in embedding_forward_quantized_host_cpu.cpp.o
      __ZNK12PrunedMapCPU9serializeEv in embedding_forward_quantized_host_cpu.cpp.o
  __ZN5torch9serialize13OutputArchive7save_toERSo, referenced from:
      __ZNK12PrunedMapCPU9serializeEv in embedding_forward_quantized_host_cpu.cpp.o
  __ZN5torch9serialize13OutputArchiveC1ESt10shared_ptrINS_3jit15CompilationUnitEE, referenced from:
      __ZNK12PrunedMapCPU9serializeEv in embedding_forward_quantized_host_cpu.cpp.o
  __ZNK3c104Type14isSubtypeOfExtERKS0_PSo, referenced from:
      __ZTVN3c1010SharedTypeE in jagged_tensor_ops_autograd.cpp.o
      __ZTVN3c1017SingleElementTypeILNS_8TypeKindE6ENS_8ListTypeEEE in jagged_tensor_ops_autograd.cpp.o
  __ZNK3c109ClassType9getMethodERKSs, referenced from:
      __ZN5torch6class_I11TensorQueueE10def_pickleINL19TensorQueueRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlNS5_4DictISsN2at6TensorEEEE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I11TensorQueueE10def_pickleINL19TensorQueueRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlNS5_4DictISsN2at6TensorEEEE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I13AtomicCounterE10def_pickleINL21AtomicCounterRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlSsE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I13AtomicCounterE10def_pickleINL21AtomicCounterRegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlSsE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I12PrunedMapCPUE10def_pickleINL20PrunedMapCPURegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlSsE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
      __ZN5torch6class_I12PrunedMapCPUE10def_pickleINL20PrunedMapCPURegistryMUlRKN3c1013intrusive_ptrIS1_NS5_6detail34intrusive_target_default_null_typeIS1_EEEEE_ENS4_UlSsE_EEERS2_OT_OT0_.isra.0 in embedding_forward_quantized_host_cpu.cpp.o
  __ZNR5torch7Library4_defEON3c1014FunctionSchemaEPNS1_12OperatorNameERKSt6vectorIN2at3TagESaIS8_EENS_17_RegisterOrVerifyE, referenced from:
      __ZN12_GLOBAL__N_1L36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2ERN5torch7LibraryE in embedding_forward_split_cpu.cpp.o
      __ZN12_GLOBAL__N_1L36TORCH_LIBRARY_FRAGMENT_init_fbgemm_3ERN5torch7LibraryE in embedding_forward_split_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      __ZL36TORCH_LIBRARY_FRAGMENT_init_fbgemm_2RN5torch7LibraryE in embedding_forward_quantized_host_cpu.cpp.o
      ...
  ___emutls_v._ZN3c104impl26raw_local_dispatch_key_setE, referenced from:
      __ZNK3c1020DispatchKeyExtractor24getDispatchKeySetUnboxedIJRKN2at6TensorES5_S5_xEEENS_14DispatchKeySetEDpRKT_.isra.0 in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd7CppNodeIN10fbgemm_gpu12_GLOBAL__N_121JaggedToPaddedDenseOpEE5applyEOSt6vectorIN2at6TensorESaIS8_EE in jagged_tensor_ops_autograd.cpp.o
      __ZN10fbgemm_gpu12_GLOBAL__N_121JaggedToPaddedDenseOp7forwardEPN5torch8autograd15AutogradContextERKN2at6TensorERKSt6vectorIS7_SaIS7_EEN3c108ArrayRefINSF_6SymIntEEEd in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd7CppNodeIN10fbgemm_gpu12_GLOBAL__N_133JaggedDenseDenseAddJaggedOutputOpEE5applyEOSt6vectorIN2at6TensorESaIS8_EE in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd7CppNodeIN10fbgemm_gpu12_GLOBAL__N_115DenseToJaggedOpEE5applyEOSt6vectorIN2at6TensorESaIS8_EE in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd8FunctionIN10fbgemm_gpu12_GLOBAL__N_133JaggedDenseDenseAddJaggedOutputOpEE5applyIS4_JRKN2at6TensorERKSt6vectorIS8_SaIS8_EESA_SA_EEENSt9enable_ifIXsrSt7is_sameIT_S4_E5valueEDTclsrSI_7forwardLDnEspcl7declvalIT0_EEEEE4typeEDpOSK_ in jagged_tensor_ops_autograd.cpp.o
      __ZN5torch8autograd7CppNodeIN10fbgemm_gpu12_GLOBAL__N_116JaggedDenseMulOpEE5applyEOSt6vectorIN2at6TensorESaIS8_EE in jagged_tensor_ops_autograd.cpp.o
      ...
  _omp_get_max_threads, referenced from:
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIfEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIdEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
      __ZN8internal7csr2cscIdEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS3_16DefaultPtrTraitsExEES8_RKNS4_IT_Lm1ES5_xEExPKix in embedding_forward_split_cpu.cpp.o
  _omp_get_num_threads, referenced from:
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.0 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.1 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.0 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.1 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IdLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.0 in embedding_forward_split_cpu.cpp.o
      ...
  _omp_get_thread_num, referenced from:
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.0 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.1 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.0 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.1 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IfLb0EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.2 in embedding_forward_split_cpu.cpp.o
      __ZN8internal12_GLOBAL__N_117csr2csc_template_IdLb1EEEvRNS_27HyperCompressedSparseColumnEiRKN2at14TensorAccessorIxLm1ENS4_16DefaultPtrTraitsExEES9_RKNS5_IT_Lm1ES6_xEExPKix._omp_fn.0 in embedding_forward_split_cpu.cpp.o
      ...
  _omp_set_num_threads, referenced from:
      __ZN10fbgemm_gpu22jagged_softmax_forwardERKN2at6TensorES3_x in jagged_tensor_ops_cpu.cpp.o
      __ZN10fbgemm_gpu23jagged_softmax_backwardERKN2at6TensorES3_S3_x in jagged_tensor_ops_cpu.cpp.o
      __ZN10fbgemm_gpu25jagged_jagged_bmm_forwardERKN2at6TensorES3_S3_x in jagged_tensor_ops_cpu.cpp.o
      __ZN10fbgemm_gpu24jagged_dense_bmm_forwardERKN2at6TensorES3_S3_x in jagged_tensor_ops_cpu.cpp.o
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/skbuild/setuptools_wrap.py", line 674, in setup
    cmkr.make(make_args, install_target=cmake_install_target, env=env)
  File "/Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/skbuild/cmaker.py", line 697, in make
    self.make_impl(clargs=clargs, config=config, source_dir=source_dir, install_target=install_target, env=env)
  File "/Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/skbuild/cmaker.py", line 742, in make_impl
    raise SKBuildError(msg)

An error occurred while building with CMake.
  Command:
    /Users/xshan/miniconda3/envs/torchrec/lib/python3.10/site-packages/cmake/data/bin/cmake --build . --target install --config Release --
  Install target:
    install
  Source directory:
    /Users/xshan/pubrepo/fbgemm/fbgemm_gpu
  Working directory:
    /Users/xshan/pubrepo/fbgemm/fbgemm_gpu/_skbuild/macosx-13.6-x86_64-3.10/cmake-build
Please check the install target is valid and see CMake's output for more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants