Skip to content

Conversation

@wenqinI
Copy link
Contributor

@wenqinI wenqinI commented Jan 16, 2026

Description

This PR supports vec1 for arbitrary input_channel in im2col kernel, which could bring performance gain to more models.

Like for yolov8n_pose model, there is about ~7% gain for whole model, and ~50% for those conv2d op which input_size are not multiple of 4.

Motivation and Context

@wenqinI
Copy link
Contributor Author

wenqinI commented Jan 16, 2026

@guschmue @fs-eire @qjia7 PTAL, thanks!

@wenqinI wenqinI marked this pull request as draft January 16, 2026 08:46
// If the status of this condition is uncertain, the feature must be disabled.
const bool use_subgroup = false;
Im2ColMatMulProgram im2col_mm_program{has_bias, tile_m, tile_n, use_subgroup};
const uint32_t vec_size = channel_input % 4 == 0 ? 4 : 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about extending it into 1, 2 or 4? const uint32_t vec_size = GetMaxComponents(channel_input);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants