Adding special tokens in text2text generation task #165

techthiyanes · 2022-12-17T04:28:26Z

Hi Team,

Could anyone please enable on displaying the special tokens for seq2seq models?
Currently seq2seq model from inferenceapi's are displaying without special tokens. Special tokens are being added as a part of tokenizer. How could we add tokenizer args like add_special_tokens=True during inference API calls. These params are not allowed to enter inside while generating the text at decoder side.

Narsil · 2022-12-19T11:33:03Z

Hi, why do you want that option ?

Sorry, but we try to limit the number of parameters available (for simplicity).

This is also not available in transformers 's pipeline (which this API is derived from).

Could you maybe start an issue in transformers for that support, documenting as much as possible why, and in what context you need this option ?
If we enable it in transformers it will instantly become available in the API (albeit not neceassarily being documented).

Cheers.

techthiyanes · 2022-12-21T02:01:11Z

Hi, Thanks for your response. There were some deq2seq models have special tokens as a part of config.json. While tokenizing any input phrase, we have an option to include add_special_tokens as true argument inside tokenizer. These special tokens will be displayed while doing beam/greedy search ouput as long as we enable it. we don't have an option to enable this params inside api inferences. All params i can include that's related to generate methods alone like do_sample, num_beams and so on.Let me know if you need further more details.

Narsil · 2022-12-21T09:37:51Z

Special tokens are meant to be non-readable, if you want to use readable tokens, couldn't you use regular added tokens ?

(tokenizer.add_tokens vs tokenizer.add_special_tokens IIRC)

Special tokens are special mostly because they are not shown. Stuff like [CLS] and [EOS] are generally not very interesting to read and do not correspond to what a model is saying, and that's why they are not displayed, right ?

techthiyanes · 2022-12-22T13:15:33Z

Thanks for your response.

There were some models it requires special tokens to be displayed. Those special tokens would be helping us to do meaningful post-processing. For example seq2seq entity extraction models, it has special tokens added in it. Based on special tokens, user can extract necessary entity results.
Example Model Space : https://huggingface.co/Babelscape/rebel-large

Thanks

osanseviero assigned Narsil Dec 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding special tokens in text2text generation task #165

Adding special tokens in text2text generation task #165

techthiyanes commented Dec 17, 2022

Narsil commented Dec 19, 2022

techthiyanes commented Dec 21, 2022

Narsil commented Dec 21, 2022

techthiyanes commented Dec 22, 2022

Adding special tokens in text2text generation task #165

Adding special tokens in text2text generation task #165

Comments

techthiyanes commented Dec 17, 2022

Narsil commented Dec 19, 2022

techthiyanes commented Dec 21, 2022

Narsil commented Dec 21, 2022

techthiyanes commented Dec 22, 2022