Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Baseline workflow + fixes #363

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Adds Baseline workflow + fixes #363

wants to merge 5 commits into from

Conversation

hynky1999
Copy link
Collaborator

@hynky1999 hynky1999 commented Oct 15, 2024

What does it do

Adds a baseline model which allows reliably computing baseline scores for multichoice/generative tasks.
The DummyModel is kind of unreliable as it random based so the baseline can be +-1 from true baseline.

Implementation comments

Unfortunately it's not possible to just use model as there is no info about type of tasks neither about gold_index in requests.

Nits:
Fixes tasks listing/insepct

Comment on lines +32 to +42
"""
Compute baselines for given tasks.

It has been tested with generative and accuracy tasks, but may not work correctly for other task types.

The baseline is computed as follows:
- For multiple-choice tasks: It assumes random guessing, so the score is n_correct/number_of_choices.
- For other metrics: It assigns a score of 0, which may not be appropriate for all task types.

Note:
This baseline computation may not be suitable for all task types and should be used with caution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be way better to update the dummy than add a new main script which will be confusing for users

Copy link
Collaborator Author

@hynky1999 hynky1999 Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I said it's not possible, I tried.

Unfortunately it's not possible to just use model as there is no info about type of tasks neither about gold_index in requests.

So there is no way one can leverage dummy for proper computation. If you have idea about how do it let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me go through the code again tmr morning, something seemed obvious an hour ago

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through things again, I agree, my ideas are way more costly than this.
We should remove the dummy model then as it will always be more costly too.
Wdyt @NathanHB ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with removing the dummy model, it was only used for fast debugging on task releated issues. Does this baseline implem allow for debugging the logging of tasks ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the dummy model, it's useful when someone just need to passthrough the pipeline workflow. @NathanHB What do you mean by logging of tasks ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant loading, and I agree it's needed for quick debugging, we should keep it then.

@hynky1999 hynky1999 changed the title Adds Baseline model + fixes Adds Baseline workflow + fixes Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants