Using quants on larger models is impossible due to high memory requirements, so I was thinking about the idea of building a method to offload memory to disk. Since relying on the swap system of the OS makes absolutely everything halt, the mechanism for such a thing has to be managed manually.
Methods like HQQ are great, but only when they fit in ram, and the same goes for all the other methods. Since it's a general problem, I wonder if it is possible to somehow generalize this problem. Just an idea for discussion.