Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(llmobs): support joining custom evaluations via tags #11535

Merged
merged 27 commits into from
Jan 10, 2025

Conversation

lievan
Copy link
Contributor

@lievan lievan commented Nov 25, 2024

This PR implements LLMObs.submit_evaluation_for method, which gives users two options for joining custom evaluations

  • by tag via the span_with_tag argument, which accepts a tuple containing a tag key/value pair
  • by span via the span argument, which accepts a dictionary containing span_id and trace_id keys

There are also a couple behavior differences between submit_evaluation_for and submit_evaluation. In the new method, we

  • throw whenever a required argument is the wrong value or type
  • remove metadata argument
  • move the warning log for missing api key to the eval metric writer's periodic method

Other changes:

Eval metric writer

Update the eval metric writer to write to the v2 eval metric endpoint. The main difference with this endpoint is that it accepts a join_with field that holds joining information instead of a top-level trace and span id fields.

Deprecate submit_evaluation

Deprecates submit_evaluation. I've set the removal version to be 3.0.0.

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

Copy link
Contributor

github-actions bot commented Nov 25, 2024

CODEOWNERS have been resolved as:

releasenotes/notes/submit-evaluation-for-01096d803d969e3e.yaml          @DataDog/apm-python
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/llmobs/_writer.py                                               @DataDog/ml-observability
tests/llmobs/_utils.py                                                  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_eval_metric_writer.send_score_metric.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_eval_metric_writer.test_send_categorical_metric.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_eval_metric_writer.test_send_metric_bad_api_key.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_eval_metric_writer.test_send_multiple_events.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_eval_metric_writer.test_send_score_metric.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_eval_metric_writer.test_send_timed_events.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_evaluator_runner.send_score_metric.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_ragas_evaluators.emits_traces_and_evaluations_on_exit.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_ragas_evaluators.test_ragas_faithfulness_emits_traces.yaml  @DataDog/ml-observability
tests/llmobs/test_llmobs_eval_metric_writer.py                          @DataDog/ml-observability
tests/llmobs/test_llmobs_service.py                                     @DataDog/ml-observability

@lievan lievan changed the title feat(llmobs): Support joining custom evaluations via tags feat(llmobs): support joining custom evaluations via tags Nov 25, 2024
@pr-commenter
Copy link

pr-commenter bot commented Nov 25, 2024

Benchmarks

Benchmark execution time: 2025-01-10 16:05:25

Comparing candidate commit f87ef58 in PR branch evan.li/submit-evaluation-for with baseline commit 5e68823 in branch main.

Found 0 performance improvements and 5 performance regressions! Performance is the same for 381 metrics, 2 unstable metrics.

scenario:iast_aspects-ospathbasename_aspect

  • 🟥 execution_time [+288.074ns; +343.375ns] or [+8.582%; +10.230%]

scenario:iast_aspects-ospathdirname_aspect

  • 🟥 execution_time [+482.741ns; +556.923ns] or [+13.192%; +15.219%]

scenario:iast_aspects-ospathjoin_aspect

  • 🟥 execution_time [+416.537ns; +481.153ns] or [+8.033%; +9.279%]

scenario:iast_aspects-ospathsplitext_aspect

  • 🟥 execution_time [+288.461ns; +345.134ns] or [+7.967%; +9.532%]

scenario:iast_aspects-replace_aspect

  • 🟥 execution_time [+473.459ns; +547.883ns] or [+7.070%; +8.181%]

@lievan lievan marked this pull request as ready for review November 25, 2024 20:24
@lievan lievan requested review from a team as code owners November 25, 2024 20:24
ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
@lievan lievan changed the title feat(llmobs): support joining custom evaluations via tags feat(llmobs): support joining custom evaluations via tags Nov 26, 2024
Copy link
Contributor

@Yun-Kim Yun-Kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first quick pass, will do another pass on Monday!

ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
@christophe-papazian christophe-papazian removed their request for review December 3, 2024 13:42
Copy link
Contributor

@Yun-Kim Yun-Kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comments/suggestions, but approving as none are super unblocking!

ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_llmobs.py Show resolved Hide resolved
ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved
@datadog-dd-trace-py-rkomorn
Copy link

datadog-dd-trace-py-rkomorn bot commented Jan 7, 2025

Datadog Report

Branch report: evan.li/submit-evaluation-for
Commit report: 86d204f
Test service: dd-trace-py

✅ 0 Failed, 87 Passed, 1468 Skipped, 3m 46.07s Total duration (35m 21.75s time saved)

@lievan lievan enabled auto-merge (squash) January 9, 2025 15:35
lievan added 2 commits January 10, 2025 10:24
@lievan lievan merged commit 1b223aa into main Jan 10, 2025
264 of 265 checks passed
@lievan lievan deleted the evan.li/submit-evaluation-for branch January 10, 2025 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants