Skip to content
Brian Dashore edited this page Dec 30, 2023 · 9 revisions

Welcome to TabbyAPI!

This wiki aims to provide a place for documenting various aspects of this project.

Not sure where to start? Check out the Getting Started page.

FAQ + Common Issues

  • What OS is supported?

    • Windows and Linux
  • What samplers are enabled?

  • How do I interface with the API?

    • The wiki is meant for user-facing documentation. OpenAPI documentation can be accessed by navigating to /docs on a local TabbyAPI install (ex. http://localhost:5000/docs)
  • What does TabbyAPI run?

    • TabbyAPI uses Exllamav2 as a powerful and fast backend for model inference, loading, etc. Therefore, the following types of models are supported:

      • Exl2 (Highly recommended)

      • GPTQ

      • FP16 (using Exllamav2's loader)

  • Exllamav2 may error with the following exception: ImportError: DLL load failed while importing exllamav2_ext: The specified module could not be found.

    • First, make sure to check if the wheel is equivalent to your python version and CUDA version. Also make sure you're in a venv or conda environment.

    • If those prerequisites are correct, the torch cache may need to be cleared. This is due to a mismatching exllamav2_ext.

      • In Windows: Find the cache at C:\Users\<User>\AppData\Local\torch_extensions\torch_extensions\Cache where <User> is your Windows username

      • In Linux: Find the cache at ~/.cache/torch_extensions

      • look for any folder named exllamav2_ext in the python subdirectories and delete them.

      • Restart TabbyAPI and launching should work again.

Clone this wiki locally