Feature request: Return both `summary` & `eval_results` from `Evaluation.evaluate` #3038

TeoZosa · 2024-11-21T03:52:43Z

For this code:

Lines 493 to 510 in 3eea8df

    
           @weave.op() 
        
           async def evaluate(self, model: Union[Callable, Model]) -> dict: 
        
               # The need for this pattern is quite unfortunate and highlights a gap in our 
        
               # data model. As a user, I just want to pass a list of data `eval_rows` to 
        
               # summarize. Under the hood, Weave should choose the appropriate storage 
        
               # format (in this case `Table`) and serialize it that way. Right now, it is 
        
               # just a huge list of dicts. The fact that "as a user" I need to construct 
        
               # `weave.Table` at all is a leaky abstraction. Moreover, the need to 
        
               # construct `EvaluationResults` just so that tracing and the UI works is 
        
               # also bad. In the near-term, this will at least solve the problem of 
        
               # breaking summarization with big datasets, but this is not the correct 
        
               # long-term solution. 
        
               eval_results = await self.get_eval_results(model) 
        
               summary = await self.summarize(eval_results) 
        
               print("Evaluation summary", summary) 
        
               return summary

Only summary is returned. For our use-case, we also want to grab eval_results for downstream rendering and storage (to present user-friendly results to non-technical stakeholders; the Weave UI is information overload for those folks). Would this change make sense?

As a workaround, I can call get_eval_results and summarize separately, but lose eval tracking in Evaluations since only Evaluation.evaluate Calls are picked up.

The text was updated successfully, but these errors were encountered:

jwlee64 · 2024-11-21T17:32:54Z

Hi @TeoZosa, I can raise this to the team tomorrow.

TeoZosa · 2024-11-22T03:43:28Z

@jwlee64 sounds good, thanks for the prompt reply 👍

FWIW for feedback, I'm working around the issue by vendoring the code with this change:

-   @weave.op()
+   @weave.op(postprocess_output=lambda output: output[0])
    async def evaluate(self, model: Union[Callable, Model]) -> dict:
        eval_results = await self.get_eval_results(model)
        summary = await self.summarize(eval_results)

        print("Evaluation summary", summary)

-        return summary
+        return summary, eval_results

jwlee64 · 2024-11-22T17:53:01Z

Hi @TeoZosa, we are going to have someone tackle this in the next week or so, or at minimum document a better way to get the eval_results with the current api.

TeoZosa · 2024-11-23T04:26:15Z

Got it. Thanks for the update @jwlee64, keep me posted! 🙏

jwlee64 added the enhancement New feature or request label Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Return both `summary` & `eval_results` from `Evaluation.evaluate` #3038

Feature request: Return both `summary` & `eval_results` from `Evaluation.evaluate` #3038

TeoZosa commented Nov 21, 2024

jwlee64 commented Nov 21, 2024

TeoZosa commented Nov 22, 2024

jwlee64 commented Nov 22, 2024

TeoZosa commented Nov 23, 2024

Feature request: Return both summary & eval_results from Evaluation.evaluate #3038

Feature request: Return both summary & eval_results from Evaluation.evaluate #3038

Comments

TeoZosa commented Nov 21, 2024

jwlee64 commented Nov 21, 2024

TeoZosa commented Nov 22, 2024

jwlee64 commented Nov 22, 2024

TeoZosa commented Nov 23, 2024

Feature request: Return both `summary` & `eval_results` from `Evaluation.evaluate` #3038

Feature request: Return both `summary` & `eval_results` from `Evaluation.evaluate` #3038