Skip to content

Commit 236c791

Browse files
committed
Merge branch 'hn-fix-tokenizer' into 'main'
Fix TikTokenizer decoding case See merge request ADLR/megatron-lm!1827
2 parents 054c196 + 4ec593d commit 236c791

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

megatron/inference/text_generation/tokenization.py

+2
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,8 @@ def detokenize_generations(tokens_gpu_tensor,
3535
'HuggingFaceTokenizer',
3636
'Llama2Tokenizer']:
3737
word = tokenizer.decoder[token]
38+
elif args.tokenizer_type == 'TikTokenizer':
39+
word = tokenizer.detokenize([token])
3840
elif args.tokenizer_type in ['Llama3Tokenizer', 'MistralTokenizer']:
3941
word = tokenizer.decode([token])
4042
elif args.tokenizer_type == 'NullTokenizer':

0 commit comments

Comments
 (0)