Skip to content

GroupedGemm: FP8 per-tensor via cuBLAS #2452

@ptrendx

Description

@ptrendx

Add support for GroupedGemm with FP8 per-tensor quantization using cuBLAS. Ensure that grouped operations are efficiently batched and fully compatible with device-supplied data buffers.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions