Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sampling: add Top-nσ sampler #11223

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Conversation

VJHack
Copy link
Contributor

@VJHack VJHack commented Jan 14, 2025

Top-nσ: Not All Logits Are You Need

https://arxiv.org/pdf/2411.07641
The authors of this paper propose a new sampling method known as Top-nσ. The main feature of this sampler is that "unlike existing methods (e.g., top-p, min-p) that inadvertently include more noise tokens at higher temperatures, top-nσ maintains a stable sampling space regardless of temperature scaling". They discovered that logits natually separate into a gaussian-distributed noisy region and an informative region.

This PR implements the sampling method proposed in the paper. Here the algorithm implemented from the paper:
Screen Shot 2025-01-13 at 6 14 41 PM

Since the manipulation is done directly on the logits pre-softmax, I added it as a stand-alone sampler instead of chaining it with the common samplers. The changes only add support for llama-cli.
sampler chain: logits -> logit-bias -> temp -> top-n-sigma -> dist

I'm aware that this algorithm is still in it's early phases so we could tag this as demo for now but I'll leave that choice up to the maintainers.

resolves #11057

Relavent Links:
https://huggingface.co/papers/2411.07641
https://arxiv.org/pdf/2411.07641
https://github.com/Tomorrowdawn/top_nsigma
#11057

@github-actions github-actions bot added the testing Everything test related label Jan 14, 2025
@VJHack VJHack marked this pull request as ready for review January 14, 2025 01:12
@MaggotHATE
Copy link
Contributor

Thank you for this implementation! Top-nσ is definitely special and needs a lot of testing.

I like the results so far, especially since high temperature is not a problem, as shown in the paper, and I'm going to test it more and see what its limitations are.

common/sampling.cpp Outdated Show resolved Hide resolved
@VJHack VJHack requested a review from slaren January 21, 2025 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Top-nσ sampler
4 participants