What's the purpose of zero_token and the difference with initial_token? #165

npuichigo · 2024-12-05T05:59:26Z

Due diligence

I have done my due diligence in trying to find the answer myself.

Topic

Other / All

Question

One place to use zero_token still overrides it later. So could any help to explain when it's used? Should the delayed audio token be filled with initial_token or zero_token? As input or output? Thanks.

moshi/moshi/moshi/models/lm.py

Lines 246 to 260 in 67f594d

    
           def _get_initial_token(self) -> torch.Tensor: 
        
               # Returns the initial token that will be fed to the model to predict the very first timestep. 
        
               # The output shape will be [B, K, 1]. 
        
               device = next(iter(self.parameters())).device 
        
               zero = torch.full( 
        
                   [1, 1, 1], self.zero_token_id, device=device, dtype=torch.long 
        
               ) 
        
               special = torch.full_like(zero, self.initial_token_id) 
        
               text_special = torch.full_like(zero, self.text_initial_token_id) 
        
               audio_token = special 
        
               text_token = text_special 
        
               audio_token = audio_token.expand(-1, self.num_audio_codebooks, -1) 
        
               token = torch.cat([text_token, audio_token], dim=1) 
        
               return token

LaurentMazare · 2024-12-05T06:57:12Z

The delayed audio should be filled by the initial tokens (in the released version of moshi: 32000 for text, 2048 for audio). zero is actually only used for some full_like in this _get__initial_token function - most likely this is because we extracted this code from a larger codebase and didn't take the time to clean this bit.

npuichigo · 2024-12-05T10:36:10Z

Thanks for the reply. I just though zero_token works together with zero_idx in ScaledEmbedding to let Helium could mask out zero codes to still be able to generate text after per-training stage.

npuichigo added the question Further information is requested label Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the purpose of zero_token and the difference with initial_token? #165

What's the purpose of zero_token and the difference with initial_token? #165

npuichigo commented Dec 5, 2024

LaurentMazare commented Dec 5, 2024

npuichigo commented Dec 5, 2024

What's the purpose of zero_token and the difference with initial_token? #165

What's the purpose of zero_token and the difference with initial_token? #165

Comments

npuichigo commented Dec 5, 2024

Due diligence

Topic

Question

LaurentMazare commented Dec 5, 2024

npuichigo commented Dec 5, 2024