chore(llmobs): consolidate internally to one experiment class by cdfox · Pull Request #16568 · DataDog/dd-trace-py

cdfox · 2026-02-18T23:43:58Z

Description

Following up #16442, this is an internal refactor, no user-facing change.

Rename AsyncExperiment to Experiment (replacing the previous sync Experiment). Add thin wrapper SyncExperiment class that has an Experiment as attribute and provides a sync run(). Now even if user is running an experiment synchronously, with sync task and evaluators, internally we run the experiment using async. This way, experiment is always running async internally, which lets us have just one Experiment class and no base class, reducing duplication and simplifying the maintenance.

Doc with more background on the changes

To make it easier to review and easier for reviewers to feel confident, the existing sync tests are pretty much left as-is with minimal changes.

Left for future work

removing some redundancies in the tests
refactoring tests to only use public run() method and not call internal methods directly

Testing

Will update here once I run test script

Risks

Potentially there could be issues for users running a sync experiment from jupyter notebook. Need to test this.

Additional Notes

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 643e608e76

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-18T23:51:26Z

ddtrace/llmobs/_experiment.py

+            raise RuntimeError(
+                "Cannot use SyncExperiment.run() from an async context. "
+                "Use 'await exp.run()' with LLMObs.async_experiment() instead."


Allow sync experiment run when an event loop is active

SyncExperiment.run() now unconditionally raises in any thread with a running event loop, which breaks previously working sync call paths in environments like Jupyter notebooks and async services that invoke shared synchronous helpers. Before this refactor, LLMObs.experiment(...).run() used the synchronous implementation and did not hard-fail in these contexts, so this introduces a user-visible regression rather than a pure internal refactor.

Useful? React with 👍 / 👎.

ddtrace/llmobs/_llmobs.py

cit-pr-commenter-54b7da · 2026-02-18T23:52:42Z

Codeowners resolved as

ddtrace/llmobs/_experiment.py                                           @DataDog/ml-observability

gsvigruha · 2026-02-19T00:12:47Z

ddtrace/llmobs/_llmobs.py

        summary_evaluators: Optional[Sequence[SummaryEvaluatorType]] = None,
        runs: Optional[int] = 1,
-    ) -> Experiment:
+    ) -> SyncExperiment:


is this a breaking change for existing users? i guess you're saying in the release notes its not but why :)

it has a run method thats same as before

i guess customers might import the experiment type? not sure

I was operating under the assumption the Experiment class itself is internal and people are meant to only get to it through the factory method (since it's that way in the docs). We have some validation in the factory that you don't get if you go direct.

gsvigruha · 2026-02-19T00:16:10Z

ddtrace/llmobs/_experiment.py

+                "Cannot use SyncExperiment.run() from an async context. "
+                "Use 'await exp.run()' with LLMObs.async_experiment() instead."
+            )
+        return asyncio.run(self._experiment.run(jobs=jobs, raise_errors=raise_errors, sample_size=sample_size))


this lets you run sync experiments parallel?

no it is just to re-use async code for the sync case, to reduce duplication

but i guess my question is does parallel sync execution still work? i think it did before but maybe just for evals?

Yes, the concurrency still works as it did before. There is concurrency both for running the task on records and for running evaluations (there was before and there is still is with this change).

in _run_task, all the _process_record() corountines run concurrently in the gather(). WIthin each coroutines, if it's a sync task/evaluator, it gets run on the default thread pool executor, which by default has a fairly large of max threads (I think it's like 5 * number of cpus). But on the async side, the asyncio.semaphore controls how many coroutines are active, so we rely on that mostly to control how much concurrency there is.

gsvigruha

Looks like a relatively straightforward refactoring

cdfox · 2026-02-19T17:10:57Z

@gsvigruha @Kyle-Verhoog I just created a more thorough test script to test out these changes locally: https://github.com/DataDog/llm-obs/blob/28ab7632b4bc6c40a1b44ade3022b6ebbc54d90f/demos/experiment_async/experiment_speed_test.py

It demonstrates sync and async experiments (with async task + a fast sync evaluator) and verifies there is speedup in both cases when jobs increased from 1 to 10. It also exercises the multi-run feature that was added in DnE recently, so we can see that working. Experiment run in the UI

Next I'm going to check on how this changes interacts with jupyter notebook, because I think there may be a risk there (it has running event loop already).

cdfox · 2026-02-19T17:40:06Z

@gsvigruha @Kyle-Verhoog Pushed one more change to resolve an issue with running sync experiment from jupyter. I used this test notebook to verify both sync and async experiments work (async test is at the bottom): https://github.com/DataDog/llm-obs/blob/b641a7facba3f69303ba4b2258b9154c3681a6ca/demos/experiment_async/async_notebook_test.ipynb

brettlangdon · 2026-02-19T18:08:41Z

releasenotes/notes/consolidate-sync-async-experiment-0263bc14f9e5a7e0.yaml

+  - |
+    LLM Observability: Consolidates the internal sync ``Experiment`` and ``AsyncExperiment`` classes into a single
+    async-native ``Experiment`` class with a thin ``SyncExperiment`` wrapper for synchronous callers. This is an
+    internal refactoring with no changes to the public API.


internal refactoring with no changes to the public API

do we need a release note then?

What does any of this mean for a customer?

I wasn't sure if release note is always required. Won't CI fail if I leave it out?

add changelog/no-changelog label, and it'll skip the release note requirement check

consolidate internally to one experiment class

643e608

cdfox requested a review from a team as a code owner February 18, 2026 23:43

chatgpt-codex-connector bot reviewed Feb 18, 2026

View reviewed changes

release note

0bcca02

cdfox requested a review from a team as a code owner February 18, 2026 23:51

cdfox requested review from ZStriker19 and taegyunkim February 18, 2026 23:51

cdfox added 2 commits February 18, 2026 18:56

format

258d4de

cleanup trivial methods

436c4ff

gsvigruha reviewed Feb 19, 2026

View reviewed changes

cdfox added 2 commits February 18, 2026 19:18

format

2ba5df2

typing

94075f3

gsvigruha approved these changes Feb 19, 2026

View reviewed changes

fix jupyter

476723d

brettlangdon reviewed Feb 19, 2026

View reviewed changes

Conversation

cdfox commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Left for future work

Testing

Risks

Additional Notes

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cit-pr-commenter-54b7da bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codeowners resolved as

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdfox Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gsvigruha left a comment

Choose a reason for hiding this comment

Uh oh!

cdfox commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cdfox commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

cdfox commented Feb 18, 2026 •

edited

Loading

cit-pr-commenter-54b7da bot commented Feb 18, 2026 •

edited

Loading

cdfox Feb 19, 2026 •

edited

Loading

cdfox commented Feb 19, 2026 •

edited

Loading

cdfox commented Feb 19, 2026 •

edited

Loading