Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding special tokens in text2text generation task #165

Open
techthiyanes opened this issue Dec 17, 2022 · 4 comments
Open

Adding special tokens in text2text generation task #165

techthiyanes opened this issue Dec 17, 2022 · 4 comments
Assignees

Comments

@techthiyanes
Copy link

Hi Team,

Could anyone please enable on displaying the special tokens for seq2seq models?
Currently seq2seq model from inferenceapi's are displaying without special tokens. Special tokens are being added as a part of tokenizer. How could we add tokenizer args like add_special_tokens=True during inference API calls. These params are not allowed to enter inside while generating the text at decoder side.

@Narsil
Copy link
Contributor

Narsil commented Dec 19, 2022

Hi, why do you want that option ?

Sorry, but we try to limit the number of parameters available (for simplicity).

This is also not available in transformers 's pipeline (which this API is derived from).

Could you maybe start an issue in transformers for that support, documenting as much as possible why, and in what context you need this option ?
If we enable it in transformers it will instantly become available in the API (albeit not neceassarily being documented).

Cheers.

@techthiyanes
Copy link
Author

Hi, Thanks for your response. There were some deq2seq models have special tokens as a part of config.json. While tokenizing any input phrase, we have an option to include add_special_tokens as true argument inside tokenizer. These special tokens will be displayed while doing beam/greedy search ouput as long as we enable it. we don't have an option to enable this params inside api inferences. All params i can include that's related to generate methods alone like do_sample, num_beams and so on.Let me know if you need further more details.

@Narsil
Copy link
Contributor

Narsil commented Dec 21, 2022

Special tokens are meant to be non-readable, if you want to use readable tokens, couldn't you use regular added tokens ?

(tokenizer.add_tokens vs tokenizer.add_special_tokens IIRC)

Special tokens are special mostly because they are not shown. Stuff like [CLS] and [EOS] are generally not very interesting to read and do not correspond to what a model is saying, and that's why they are not displayed, right ?

@techthiyanes
Copy link
Author

Thanks for your response.

There were some models it requires special tokens to be displayed. Those special tokens would be helping us to do meaningful post-processing. For example seq2seq entity extraction models, it has special tokens added in it. Based on special tokens, user can extract necessary entity results.
Example Model Space : https://huggingface.co/Babelscape/rebel-large

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants