Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the purpose of zero_token and the difference with initial_token? #165

Open
1 task done
npuichigo opened this issue Dec 5, 2024 · 2 comments
Open
1 task done
Labels
question Further information is requested

Comments

@npuichigo
Copy link

Due diligence

  • I have done my due diligence in trying to find the answer myself.

Topic

Other / All

Question

One place to use zero_token still overrides it later. So could any help to explain when it's used? Should the delayed audio token be filled with initial_token or zero_token? As input or output? Thanks.

def _get_initial_token(self) -> torch.Tensor:
# Returns the initial token that will be fed to the model to predict the very first timestep.
# The output shape will be [B, K, 1].
device = next(iter(self.parameters())).device
zero = torch.full(
[1, 1, 1], self.zero_token_id, device=device, dtype=torch.long
)
special = torch.full_like(zero, self.initial_token_id)
text_special = torch.full_like(zero, self.text_initial_token_id)
audio_token = special
text_token = text_special
audio_token = audio_token.expand(-1, self.num_audio_codebooks, -1)
token = torch.cat([text_token, audio_token], dim=1)
return token

@npuichigo npuichigo added the question Further information is requested label Dec 5, 2024
@LaurentMazare
Copy link
Member

The delayed audio should be filled by the initial tokens (in the released version of moshi: 32000 for text, 2048 for audio). zero is actually only used for some full_like in this _get__initial_token function - most likely this is because we extracted this code from a larger codebase and didn't take the time to clean this bit.

@npuichigo
Copy link
Author

Thanks for the reply. I just though zero_token works together with zero_idx in ScaledEmbedding to let Helium could mask out zero codes to still be able to generate text after per-training stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants