-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Nvidia optimum to speed up inference #111
Comments
Let's keep the conversation in the PR (when |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello! For Nvidia based workstations, is it possible to use Nvidia Optimum pipelines instead of Hugging Face default ones to gain speed in Whisper token generation? I have not tested it though. Here is the referenced article mentioning gains in LLaMA based models: https://huggingface.co/blog/optimum-nvidia and https://github.com/huggingface/optimum-nvidia
The text was updated successfully, but these errors were encountered: