-
-
Notifications
You must be signed in to change notification settings - Fork 161
[Evaluator] Add component #1375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
28290b4 to
96c8a1e
Compare
| "ai-redis-message-store": "src/chat/src/Bridge/Redis", | ||
| "ai-surreal-db-message-store": "src/chat/src/Bridge/SurrealDb", | ||
| "ai-evaluator": { | ||
| "prefixes": [{ "from": "src/evaluator", "to": "", "excludes": ["src/Bridge"] }] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "prefixes": [{ "from": "src/evaluator", "to": "", "excludes": ["src/Bridge"] }] | |
| "prefixes": [{ "from": "src/evaluator", "to": "" }] |
not sure, as we don't have a bridge folder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, wrong copy/paste 😅
|
Does this need to be a dedicated component? If not, where would it fit best? Platform or Agent? |
Evaluator component
Open to debate and I don't have a strong opinion on this one, the first version was a sub-directory of If it needs to be moved to an existing one, I would say |
Can it be integrated into PHPUnit, i think it kind of provides testing utilities for the platform component so it should ideally be in platform component itself |
Like |
|
First of all: yes we need this! i wonder if it's too early tho ... To me this is rather a standalone component, yes, but the integration aspect with PHPUnit is interesting, but not sure we want to do this - i might want to use evaluations not only in my test suite 🤔 For me it's too early since i feel like putting something on top of agent/chat without them being stable enough - and i would rather focus on that first - that shouldn't keep you from exploring the field here ofc. it's just another thing in parallel and i'm a bit worried about too many moving parts tbh. what's your blueprint for tackling this? do you use some kind of reference? |
This PR (as a POC for now) introduce the
Evaluatorcomponent, this component is mainly used to evaluate and add a score to output from platforms / agents, I moved it to a new directory as a mirror of theValidatorcomponent, the goal is to allows to add new scorers without impacting the platforms (even if they can define their own scorers).The PR is built around the following concepts:
Evaluatorthat receiveScorersand compute the final scoreScorerInterfaceimplementations that defines score usingscoremethod (could be improved, I'm not locked on the name)AbstractScorerused to define a reason (mostly used for "LLM as judge" scorers) for a scoreAiBundlevia the Profiler and a subscriber (no configuration for now).