suggestion, make quantization possible to offload to disk instead of ram

Using quants on larger models is impossible due to high memory requirements, so I was thinking about the idea of building a method to offload memory to disk. Since relying on the swap system of the OS makes absolutely everything halt, the mechanism for such a thing has to be managed manually.

Methods like HQQ are great, but only when they fit in ram, and the same goes for all the other methods. Since it's a general problem, I wonder if it is possible to somehow generalize this problem. Just an idea for discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

suggestion, make quantization possible to offload to disk instead of ram #116

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

suggestion, make quantization possible to offload to disk instead of ram #116

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions