Replies: 6 comments 8 replies
-
|
Hi @cdsousa , thanks for the suggestion! The closest we have to that is custom operators: MatX/examples/black_scholes.cu Line 56 in e713250 As you can see it's more verbose than your suggestion. The reason is that the operator needs to follow the operator interface: https://nvidia.github.io/MatX/basics/concepts.html#operator Your lambda would qualify as a substitute for However, I can see how this might be useful for very simple operators that would like to simplify the code. We will look into this and get back to you. |
Beta Was this translation helpful? Give feedback.
-
|
@cdsousa please check out #1072 It's slightly different than what you suggested in that instead of indices it takes operators and your function applies an operation across the different operators. I think this is a more general way to do what you suggested by something like this: Note that you need extended lambda support enabled for cuda to accept this. |
Beta Was this translation helpful? Give feedback.
-
|
Wow, that was fast 😄 Yes, more generic is better! Thanks I've been trying with this, but I've hit an issue. auto A = make_tensor<float>({ 4 });
A.SetVals({ 1.1, 2.2, 3.3, 4.4 });
auto f = [] __device__(auto a, auto b) { return a+b; };
(A = apply(f, A, A)).run();but this fails to compile: auto A = make_tensor<float>({ 4 });
A.SetVals({ 1.1, 2.2, 3.3, 4.4 });
auto f = [] __device__(auto a) { return a; };
(A = apply(f, A)).run();(compiler error)``` [build] /usr/local/cuda-13.0/targets/x86_64-linux/include/cccl/cuda/std/__tuple_dir/tuple_size.h(71): error: incomplete type "cuda::std::__4::tuple_size> &>>" (aka "cuda::std::__4::tuple_size, cuda::std::__4::array, 1>>>") is not allowed [build] inline constexpr size_t tuple_size_v = tuple_size<_Tp>::value; [build] ^ [build] detected during: [build] instantiation of "const size_t cuda::std::__4::tuple_size_v [with _Tp=matx::tensor_t, cuda::std::__4::array, 1>>]" at line 1366 of /usr/local/cuda-13.0/targets/x86_64-linux/include/cccl/cuda/std/detail/libcxx/include/tuple [build] instantiation of "decltype(auto) cuda::std::__4::apply(_Fn &&, _Tuple &&) [with _Fn=lambda [](auto)->auto &, _Tuple=matx::tensor_t, cuda::std::__4::array, 1>> &]" at line 74 of /home.... ``` Also, as far as I understand, the |
Beta Was this translation helpful? Give feedback.
-
|
Hey @cliffburdick , thanks for #1072! using yuyv_t = cuda::std::tuple<uchar2, uchar2>;
using rgb_t = uchar3;
// `tensor_in` and `tensor_out` are `holoscan::Tensor`s of uint8 with shapes H,W,2 and H,W,3 (yuvy and rgb), respectively
auto matx_tensor_in =
matx::make_tensor<yuyv_t>(static_cast<yuyv_t*>(tensor_in->data()), { tensor_in->shape()[0], tensor_in->shape()[1] / 2 });
auto matx_tensor_out =
matx::make_tensor<rgb_t>(static_cast<rgb_t*>(tensor_out->data()), { tensor_out->shape()[0], tensor_out->shape()[1] });
auto yuyv_to_rgb = [matx_tensor_in] __device__(auto, auto i, auto j) {
auto in = matx_tensor_in(i, j / 2);
auto y = (j % 2) == 0 ? cuda::std::get<0>(in).x : cuda::std::get<1>(in).x;
auto u = cuda::std::get<0>(in).y;
auto v = cuda::std::get<1>(in).y;
auto r = static_cast<uint8_t>(cuda::std::clamp(y + 1.402f * (v - 128), 0.0f, 255.0f));
auto g = static_cast<uint8_t>(cuda::std::clamp(y - 0.344136f * (u - 128) - 0.714136f * (v - 128), 0.0f, 255.0f));
auto b = static_cast<uint8_t>(cuda::std::clamp(y + 1.772f * (u - 128), 0.0f, 255.0f));
return rgb_t{ r, g, b };
};
(matx_tensor_out = matx::apply(yuyv_to_rgb, matx_tensor_out, matx::index(0), matx::index(1))).run(); |
Beta Was this translation helpful? Give feedback.
-
|
Hi @cdsousa, a couple updates:
Be on the lookout for #1 to merge probably tomorrow. |
Beta Was this translation helpful? Give feedback.
-
|
This has been merged |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
One thing that I've been trying to do with MatX is to be able to populate a tensor with the return of some function that is evaluated for each element.
Moreover it would be even greater if that functor could be called over the indices of each element.
Correct me if I'm wrong, but MatX has no means of doing that as of now.
Thrust can kind of do it, but it works only for 1D vectors allocated by Thrust.
I would like to be able to do something like this:
Beta Was this translation helpful? Give feedback.
All reactions