-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review TextClassification task #1073
base: develop
Are you sure you want to change the base?
Conversation
Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-1073/ |
CodSpeed Performance ReportMerging #1073 will improve performances by ×7.2Comparing Summary
Benchmarks breakdown
|
Hi @sdiazlor! The thing with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sdiazlor, I think you should also update the prompt template to make this work smoothly.
'"label"' | ||
if self.n == 1 | ||
else "[" + ", ".join([f'"label_{i}"' for i in range(self.n)]) + "]" | ||
"[" + ", ".join([f'"label_{i}"' for i in range(random.randint(1, 3))]) + "]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a randomint 1,3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was doubting about the best approach as I didn't want to implicit a fixed number of labels. So, that's why I randomized it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a more fundamental change to this implementation on the level of the prompt template. We ideally want to have an arbitrary number of labels based on a potential set, where we should allow for 0 to n labels without it forcefully setting a fixed number.
@plaguss perhaps we don't know enough about the context of the paper/implementation but when would it be useful to set a fixed number? Normally in a mulit-label textcat setting, you would go for a random number of labels without enforcing the exact required number because it leads to mis-labelling. |
@plaguss Thanks! I read the paper and it was more focused on structured generation rather than texcat, I guess we can more or less modify the task, right? So, would it be possible to optionally select between |
This is the task implementing the |
Totally. As long as both options are available work it's perfect |
for more information, see https://pre-commit.ci
No description provided.