Skip to content

Conversation

@Guikingone
Copy link
Contributor

Q A
Bug fix? no
New feature? yes
Docs? yes
Issues --
License MIT

This PR (as a POC for now) introduce the Evaluator component, this component is mainly used to evaluate and add a score to output from platforms / agents, I moved it to a new directory as a mirror of the Validator component, the goal is to allows to add new scorers without impacting the platforms (even if they can define their own scorers).

The PR is built around the following concepts:

  • An Evaluator that receive Scorers and compute the final score
  • ScorerInterface implementations that defines score using score method (could be improved, I'm not locked on the name)
  • An AbstractScorer used to define a reason (mostly used for "LLM as judge" scorers) for a score
  • An implementation on AiBundle via the Profiler and a subscriber (no configuration for now).

"ai-redis-message-store": "src/chat/src/Bridge/Redis",
"ai-surreal-db-message-store": "src/chat/src/Bridge/SurrealDb",
"ai-evaluator": {
"prefixes": [{ "from": "src/evaluator", "to": "", "excludes": ["src/Bridge"] }]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"prefixes": [{ "from": "src/evaluator", "to": "", "excludes": ["src/Bridge"] }]
"prefixes": [{ "from": "src/evaluator", "to": "" }]

not sure, as we don't have a bridge folder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, wrong copy/paste 😅

@OskarStark
Copy link
Contributor

Does this need to be a dedicated component? If not, where would it fit best? Platform or Agent?

@OskarStark OskarStark changed the title [Core] Introduce the Evaluator component [Evaluator] Add component Jan 12, 2026
@Guikingone
Copy link
Contributor Author

Does this need to be a dedicated component?

Open to debate and I don't have a strong opinion on this one, the first version was a sub-directory of Platform but as PR's tend to stay "simple" and once complexity is added, we tend to move it as "bridge", I preferred to move it as a new component to keep clean the Platform one.

If it needs to be moved to an existing one, I would say Platform, we don't depends on Agent.

@aszenz
Copy link
Contributor

aszenz commented Jan 12, 2026

Does this need to be a dedicated component?

Open to debate and I don't have a strong opinion on this one, the first version was a sub-directory of Platform but as PR's tend to stay "simple" and once complexity is added, we tend to move it as "bridge", I preferred to move it as a new component to keep clean the Platform one.

If it needs to be moved to an existing one, I would say Platform, we don't depends on Agent.

Can it be integrated into PHPUnit, i think it kind of provides testing utilities for the platform component so it should ideally be in platform component itself

@Guikingone
Copy link
Contributor Author

Can it be integrated into PHPUnit

Like Mailer / Messenger assertions? Of course 🙂

@chr-hertel
Copy link
Member

First of all: yes we need this! i wonder if it's too early tho ...

To me this is rather a standalone component, yes, but the integration aspect with PHPUnit is interesting, but not sure we want to do this - i might want to use evaluations not only in my test suite 🤔

For me it's too early since i feel like putting something on top of agent/chat without them being stable enough - and i would rather focus on that first - that shouldn't keep you from exploring the field here ofc. it's just another thing in parallel and i'm a bit worried about too many moving parts tbh.

what's your blueprint for tackling this? do you use some kind of reference?
i recall discussing this briefly with @tgalopin in Amsterdam - was it dspy or pydantic? or sth else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants