Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM Backend Integration with EXL2 Quantization Format #4

Open
Egalitaristen opened this issue Mar 30, 2024 · 0 comments
Open

LLM Backend Integration with EXL2 Quantization Format #4

Egalitaristen opened this issue Mar 30, 2024 · 0 comments

Comments

@Egalitaristen
Copy link
Owner

Description:
We are embarking on an initiative to integrate the EXL2 quantization format into our LLM backend. This task is driven by EXL2's reputation for delivering fast inference speeds, which we aim to harness to improve our system's overall performance. The project involves several key phases:

Comprehensive Understanding: Delve into EXL2 documentation and source material to fully grasp its parameters, capabilities, and the quantization process. This foundational knowledge will guide the integration strategy.

Example Code Evaluation: Study existing implementations of EXL2 to gather insights and best practices. This exploration will help identify common pitfalls and effective optimization techniques.

Model Selection: Identify a high-performing model compatible with EXL2 that fits within the memory limitations of consumer GPUs. This model must also be available under an open-source license (e.g., MIT) to ensure its free usage within our project. The selection process will include research into various models, assessing their performance, compatibility with EXL2, and legal usage terms.

Integration Plan Development: Based on the acquired knowledge and selected model, develop a detailed plan for integrating EXL2 into our backend. This plan will outline the technical steps, resources required, and a timeline for implementation.

Optimization and Testing: Implement the integration based on the developed plan, followed by rigorous testing to ensure that the backend not only supports EXL2 efficiently but also maintains or enhances model performance.

Contributions are welcome in all phases, especially in identifying suitable models, sharing knowledge on EXL2, and discussing potential challenges and solutions. This project is an opportunity to significantly enhance our backend's capabilities and ensure our technology remains at the cutting edge of LLM inference efficiency.

@Egalitaristen Egalitaristen converted this from a draft issue Mar 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant