You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feature request: Grouped output metrics & exportable result files for benchmarks
Summary:
After using the Inspect AI framework while implementing the NIAH benchmark (as part of ASET), I would like to propose two feature enhancements based on my usage. Specifically, I think it would be advantageous to have (i) support for grouping performance metrics by certain parameters (e.g., categorical variables) and (ii) a more flexible output file export mechanism.
Disclaimer: I'm aware that one of both of these features may already exist in the framework and I might not have come across them yet. If so, I can update the NIAH benchmark to incorporate these and will remove this request. If this functionality exists and isn't document, I believe it would be great to have it documented in the examples.
Current Behaviour:
Inspect AI currently outputs a single metric for all samples, without the ability to group the results by categorical parameters (e.g., experiment configurations).
In the NIAH benchmark, I worked around this limitation by using a wrapper function to pass additional meta-information. Custom metric functions were then used to generate subset scores based on these parameters.
I think it would be beneficial to have a more streamlined approach to producing group performance metrics by experimental parameters directly in the framework.
Additionally, it would be useful to be able to generate output files that contain a summary of performance across these grouped performance metrics.
Proposed Changes:
Grouped Output Metrics:
Extend the functionality of Inspect AI to support grouped output metrics, where performance scores can be categorised or grouped based on specific experimental parameters.
This would help in evaluating benchmarks across different configurations and make it easier to analyse and compare results.
Exportable Output File:
It's my understanding that it's currently only possible to extract certain information from logs. It would be helpful to provide an explicit feature to save the output metrics as a standalone, tabular file.
The ability to define the desired output file location and format during the evaluation run would greatly improve usability, especially when dealing with large sets of benchmark results.
Thanks for these suggestions! I'm planning on doing some work on improving our scoring support in the next couple of weeks - I'll put these on the list of items to work through!
Thanks for these suggestions! I'm planning on doing some work on improving our scoring support in the next couple of weeks - I'll put these on the list of items to work through!
That's great, thank you very much! If you have any questions or I can support in anyway, just let me know.
Feature request: Grouped output metrics & exportable result files for benchmarks
Summary:
After using the Inspect AI framework while implementing the NIAH benchmark (as part of ASET), I would like to propose two feature enhancements based on my usage. Specifically, I think it would be advantageous to have (i) support for grouping performance metrics by certain parameters (e.g., categorical variables) and (ii) a more flexible output file export mechanism.
Disclaimer: I'm aware that one of both of these features may already exist in the framework and I might not have come across them yet. If so, I can update the NIAH benchmark to incorporate these and will remove this request. If this functionality exists and isn't document, I believe it would be great to have it documented in the examples.
Current Behaviour:
Proposed Changes:
Version
The text was updated successfully, but these errors were encountered: