Replies: 2 comments 2 replies
-
Hey, thanks for mentioning this! Would being able to run LLM-as-a-judge on an individual span level work? Or do you require a sequence of spans? |
Beta Was this translation helpful? Give feedback.
1 reply
-
+1 on having some way to traverse the span tree and access inner spans more easily (e.g. via DFS/BFS). Currently we have to reconstruct the span tree using the flat list of observations, which is still feasible, but feels like additional work. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the feature or potential improvement
Goal
For integration evals in multi-agent setup, input -> agent_a -> agent_b -> output, I will get a trace with multiple spans for this input-output call. I want a method to use this trace to evaluate each agent's behaviour. Information required: for each span within the trace, we need the function_name, input, output.
Problem
For a multi-step agent setup, the trace contains multiple spans, each from the inner call to LLM provider. We do not have any way to construct or fetch by span_id to evaluate inner spans.
Happy to contribute. I do not have a good proposed solution now. Also open to other solutions to achieve e2e evaluation.
Additional information
No response
Beta Was this translation helpful? Give feedback.
All reactions