-
-
Notifications
You must be signed in to change notification settings - Fork 104
Home
Welcome to TabbyAPI!
This wiki aims to provide a place for documenting various aspects of this project.
Not sure where to start? Check out the Getting Started page.
-
What OS is supported?
- Windows and Linux
-
What samplers are enabled?
- Read 3. Sampling
-
How do I interface with the API?
- The wiki is meant for user-facing documentation. OpenAPI documentation can be accessed by navigating to
/docs
on a local TabbyAPI install (ex.http://localhost:5000/docs
)
- The wiki is meant for user-facing documentation. OpenAPI documentation can be accessed by navigating to
-
What does TabbyAPI run?
-
TabbyAPI uses Exllamav2 as a powerful and fast backend for model inference, loading, etc. Therefore, the following types of models are supported:
-
Exl2 (Highly recommended)
-
GPTQ
-
FP16 (using Exllamav2's loader)
-
-
-
Exllamav2 may error with the following exception:
ImportError: DLL load failed while importing exllamav2_ext: The specified module could not be found.
-
First, make sure to check if the wheel is equivalent to your python version and CUDA version. Also make sure you're in a venv or conda environment.
-
If those prerequisites are correct, the torch cache may need to be cleared. This is due to a mismatching exllamav2_ext.
-
In Windows: Find the cache at
C:\Users\<User>\AppData\Local\torch_extensions\torch_extensions\Cache
where<User>
is your Windows username -
In Linux: Find the cache at
~/.cache/torch_extensions
-
look for any folder named
exllamav2_ext
in the python subdirectories and delete them. -
Restart TabbyAPI and launching should work again.
-
-