Refactor QBitsTensor subclasses #314

dacorvo · 2024-09-20T07:35:50Z

What does this PR do?

This is a purely internal refactoring to ease the introduction of Marlin int4 kernels.

Allowing an optimizer to be passed to the quantize_weight function led to a cross-dependency with the optimizers, that might in turn need to quantize weights.

This aligns the organization of the QBitsTensor classes on the one used for QBytesTensor.

Also avoid exporting AWQ and TinyGemm classes at the top level.

dacorvo force-pushed the refactor_weights_qbits branch from 88faee0 to 5609f3e Compare September 20, 2024 07:49

dacorvo added 4 commits September 20, 2024 08:17

refactor(quantize_weight): require scale (and shift)

750f196

Allowing an optimizer to be passed to the quantize_weight function led to a cross-dependency with the optimizers, that might in turn need to quantize weights.

refactor(qtensor): introduce WeightQBitsTensor

3bfcf31

This aligns the organization of the QBitsTensor classes on the one used for QBytesTensor.

refactor(qbits): inline ops dispatch

f682849

refactor(qbits): remove subdirectory

d184901

Also avoid exporting AWQ and TinyGemm classes at the top level.

dacorvo force-pushed the refactor_weights_qbits branch 2 times, most recently from ac9e95c to c23f155 Compare September 20, 2024 08:25

refactor(marlin): prepare the introduciton of int4 kernel

fe72fc4

dacorvo force-pushed the refactor_weights_qbits branch from c23f155 to fe72fc4 Compare September 20, 2024 08:26

dacorvo merged commit 2c49054 into main Sep 20, 2024
16 checks passed

dacorvo deleted the refactor_weights_qbits branch September 20, 2024 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor QBitsTensor subclasses #314

Refactor QBitsTensor subclasses #314

dacorvo commented Sep 20, 2024

Refactor QBitsTensor subclasses #314

Refactor QBitsTensor subclasses #314

Conversation

dacorvo commented Sep 20, 2024

What does this PR do?