Skip to content

[FEA] Has CUTLASS considered supporting Zero-points and block-wise scaling in Hoppr Mixed Grouped Gemm recently? #2261

@mengsoso

Description

@mengsoso

For the Hopper architecture, the mixed_dtype_grouped_gemm currently only supports row-wise scaling. However, for the AWQ quantization, the precision loss is still quite significant.

Image

Will CUTLASS support the Zero-points and block-wise scaling of AWQ (W4A16 / W4A8) for MoE models?

Thanks~

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions