Adds Baseline workflow + fixes #363

hynky1999 · 2024-10-15T15:37:14Z

What does it do

Adds a baseline model which allows reliably computing baseline scores for multichoice/generative tasks.
The DummyModel is kind of unreliable as it random based so the baseline can be +-1 from true baseline.

Implementation comments

Unfortunately it's not possible to just use model as there is no info about type of tasks neither about gold_index in requests.

Nits:
Fixes tasks listing/insepct

clefourrier · 2024-10-15T16:07:13Z

src/lighteval/main_baseline.py

+ """
+ Compute baselines for given tasks.
+
+ It has been tested with generative and accuracy tasks, but may not work correctly for other task types.
+
+ The baseline is computed as follows:
+ - For multiple-choice tasks: It assumes random guessing, so the score is n_correct/number_of_choices.
+ - For other metrics: It assigns a score of 0, which may not be appropriate for all task types.
+
+ Note:
+ This baseline computation may not be suitable for all task types and should be used with caution.


It would be way better to update the dummy than add a new main script which will be confusing for users

But I said it's not possible, I tried.

Unfortunately it's not possible to just use model as there is no info about type of tasks neither about gold_index in requests.

So there is no way one can leverage dummy for proper computation. If you have idea about how do it let me know.

Let me go through the code again tmr morning, something seemed obvious an hour ago

Went through things again, I agree, my ideas are way more costly than this.
We should remove the dummy model then as it will always be more costly too.
Wdyt @NathanHB ?

I'm fine with removing the dummy model, it was only used for fast debugging on task releated issues. Does this baseline implem allow for debugging the logging of tasks ?

I would keep the dummy model, it's useful when someone just need to passthrough the pipeline workflow. @NathanHB What do you mean by logging of tasks ?

I meant loading, and I agree it's needed for quick debugging, we should keep it then.

hynky1999 added 2 commits October 15, 2024 17:30

add baseline + fix tasks arg

49c80ca

comments :)

364f524

hynky1999 requested a review from clefourrier October 15, 2024 15:37

different model name so that the naming is consitent with normal models

d3b52ea

clefourrier requested changes Oct 15, 2024

View reviewed changes

NathanHB added 2 commits October 16, 2024 14:38

Merge branch 'main' into baseline_model

f08849b

Merge branch 'main' into baseline_model

40f8491

hynky1999 changed the title ~~Adds Baseline model + fixes~~ Adds Baseline workflow + fixes Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds Baseline workflow + fixes #363

Adds Baseline workflow + fixes #363

hynky1999 commented Oct 15, 2024 •

edited

Loading

clefourrier Oct 15, 2024

hynky1999 Oct 15, 2024 •

edited

Loading

clefourrier Oct 15, 2024

clefourrier Oct 16, 2024

NathanHB Oct 16, 2024

hynky1999 Oct 16, 2024

NathanHB Oct 16, 2024

Adds Baseline workflow + fixes #363

Are you sure you want to change the base?

Adds Baseline workflow + fixes #363

Conversation

hynky1999 commented Oct 15, 2024 • edited Loading

What does it do

Implementation comments

clefourrier Oct 15, 2024

Choose a reason for hiding this comment

hynky1999 Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

clefourrier Oct 15, 2024

Choose a reason for hiding this comment

clefourrier Oct 16, 2024

Choose a reason for hiding this comment

NathanHB Oct 16, 2024

Choose a reason for hiding this comment

hynky1999 Oct 16, 2024

Choose a reason for hiding this comment

NathanHB Oct 16, 2024

Choose a reason for hiding this comment

hynky1999 commented Oct 15, 2024 •

edited

Loading

hynky1999 Oct 15, 2024 •

edited

Loading