You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2. Please use English, otherwise it will be closed.
Motivation
Since speculative decoding techniques such as Eagle are mostly effective in low-throughput scenarios, I was wondering if it makes sense to tone down the size of draft trees or to shut off speculative decoding completely in a throughput-aware manner.
What do you think?
Related resources
No response
The text was updated successfully, but these errors were encountered:
Checklist
Motivation
Since speculative decoding techniques such as Eagle are mostly effective in low-throughput scenarios, I was wondering if it makes sense to tone down the size of draft trees or to shut off speculative decoding completely in a throughput-aware manner.
What do you think?
Related resources
No response
The text was updated successfully, but these errors were encountered: