-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] Adding a flag in Tensor Ref Class #2080
Comments
You just want to add a flag in the device memory to tag if the tensor is first used? just pass a pointer through params, and change the value in the mainloop. No need to hack tensor ref to do simple things like this.
this is hacky and ugly. the ctor is |
Hi @hwu36, I will explain to you my final goal; maybe you will have a better idea of what I am trying to do because I have seen you have replied to me in another issue like #2067, and these two are related. My user case is to execute multiples times the same convolution, that is something you can do in example 16 launching several iterations over the same convolution kernel. I explain this to you because my idea is that I can check and do some modifications to the filters of a convolution but only the first time this convolution is launched; that is what the flag Therefore, in the several following iterations, the filters would have already been stored in GPU memory (This is related to my issue #1987). So, to know if one iteration is the first, I had thought of adding an extra parameter to the filter tensor so it can be saved in GPU memory (this is the In addition, I plan to add another parameter to the Tensor_Ref Class, which will contain the modified values. This is my goal. I'm sorry if my previous question didn't clear up this. If I just pass a pointer through params without doing Now that perhaps you have a clear an concise idea of my doubt, perhaps you can give any advise of how I should proceed, because if I need to reserve GPU memory, its likely that I need to add something in host tensor and device_memory classes, but I have no a clear idea of how cutlass manages all of this. Thank you for your help. |
You can just create another |
@hwu36 Yes, that seems like the easiest thing to do. However, I have a couple of questions:
In this case, apart from here, do I need to change any other declaration in the kernel hierarchy?
Even if this is a tricky thing to do, how should I proceed? |
What is your question?
Hi, I want to define an extra parameter in Tensor_ref class. In my case a flag, in the form of an integer pointer to be accesses when the convolution is performed in https://github.com/NVIDIA/cutlass/blob/24f991e87930e1159f1f5a47e329d43bcfbd76b9/include/cutlass/conv/kernel/implicit_gemm_convolution.h:
The flag is to check if the GPU uses the tensor. So I have also modified the constructors of the class (in my case, as it is only intended to be used in GPU, I use cudaMalloc:
Finally, I have added a new function similar to data() to pass the pointer to the parameters of the convolution in a similar way as is done in implicit_gemmem_convolution.h:
As a result, I have also modified different parts of the convolution kernel assigning in Params the pointer to a new value
first_call
: https://github.com/NVIDIA/cutlass/blob/24f991e87930e1159f1f5a47e329d43bcfbd76b9/include/cutlass/conv/kernel/implicit_gemm_convolution.h:Later, in operator() function, as the both
*ptr
and*check
are notconst
pointers in Tensor_Ref class, they can be accessed.Only ptr_B works fine, but the program is suddenly stacked when I access and modify
first_call
.I am executing example 16 to check the implementation.
This is the code when I modify the flag using
first call
parameter:First_call
is modified because I have printed after this piece of code, but when It arrives to line 343 the process it gets stacked. It uses the GPU. I have checked that but don't know why it stops there.I think that is perhaps some kind of memory-free problem due to how I reserve memory in the constructor of the Tensor_Ref Class. Maybe it is not the proper way of doing it because I don't do any free method on the new integer pointer.
Should I modify host tensor and device_memory classes which are the ones used to define tensors from the host as is described in example [16]? :
cutlass::HostTensor<ElementInputB, LayoutInputB> tensor_b(options.filter_size);
Any help would be appreciated.
Izan.
The text was updated successfully, but these errors were encountered: