Skip to content

suggestion, make quantization possible to offload to disk instead of ram #116

@Nidvogr

Description

@Nidvogr

Using quants on larger models is impossible due to high memory requirements, so I was thinking about the idea of building a method to offload memory to disk. Since relying on the swap system of the OS makes absolutely everything halt, the mechanism for such a thing has to be managed manually.

Methods like HQQ are great, but only when they fit in ram, and the same goes for all the other methods. Since it's a general problem, I wonder if it is possible to somehow generalize this problem. Just an idea for discussion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions