You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The matmul op producing T19 becomes a segmentation boundary as the op and only itself is handled by aten, and the pre- and post sections are handled by the other schedulers. While this would make sense if the matmul op were compute-heavy, in this particular case it is unlikely as the dimensions are quite small.
@Priya2698, I think it's time we start handling these K=1 cases in the matmul op (and similar for linear) as we discussed a while back. What do you think?
In RoPE, there's a small matmul, which is currently sent to aten. For example, this is a first part of the Mistral forward RoPE module:
The matmul op producing
T19
becomes a segmentation boundary as the op and only itself is handled by aten, and the pre- and post sections are handled by the other schedulers. While this would make sense if the matmul op were compute-heavy, in this particular case it is unlikely as the dimensions are quite small.This could be translated to just a sequence of pointwise ops:
Combined with #3645, the above section of the forward module would be likely fused into a single kernel with no segmentation.
The text was updated successfully, but these errors were encountered: