Skip to content

chore(llmobs): consolidate internally to one experiment class#16568

Open
cdfox wants to merge 7 commits intomainfrom
christopher.fox/consolidate-sync-async-experiment
Open

chore(llmobs): consolidate internally to one experiment class#16568
cdfox wants to merge 7 commits intomainfrom
christopher.fox/consolidate-sync-async-experiment

Conversation

@cdfox
Copy link
Contributor

@cdfox cdfox commented Feb 18, 2026

Description

Following up #16442, this is an internal refactor, no user-facing change.

Rename AsyncExperiment to Experiment (replacing the previous sync Experiment). Add thin wrapper SyncExperiment class that has an Experiment as attribute and provides a sync run(). Now even if user is running an experiment synchronously, with sync task and evaluators, internally we run the experiment using async. This way, experiment is always running async internally, which lets us have just one Experiment class and no base class, reducing duplication and simplifying the maintenance.

Doc with more background on the changes

To make it easier to review and easier for reviewers to feel confident, the existing sync tests are pretty much left as-is with minimal changes.

Left for future work

  • removing some redundancies in the tests
  • refactoring tests to only use public run() method and not call internal methods directly

Testing

Will update here once I run test script

Risks

Potentially there could be issues for users running a sync experiment from jupyter notebook. Need to test this.

Additional Notes

@cdfox cdfox requested a review from a team as a code owner February 18, 2026 23:43
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 643e608e76

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 1393 to 1395
raise RuntimeError(
"Cannot use SyncExperiment.run() from an async context. "
"Use 'await exp.run()' with LLMObs.async_experiment() instead."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow sync experiment run when an event loop is active

SyncExperiment.run() now unconditionally raises in any thread with a running event loop, which breaks previously working sync call paths in environments like Jupyter notebooks and async services that invoke shared synchronous helpers. Before this refactor, LLMObs.experiment(...).run() used the synchronous implementation and did not hard-fail in these contexts, so this introduces a user-visible regression rather than a pure internal refactor.

Useful? React with 👍 / 👎.

@cdfox cdfox requested a review from a team as a code owner February 18, 2026 23:51
@cit-pr-commenter-54b7da
Copy link

cit-pr-commenter-54b7da bot commented Feb 18, 2026

Codeowners resolved as

ddtrace/llmobs/_experiment.py                                           @DataDog/ml-observability

summary_evaluators: Optional[Sequence[SummaryEvaluatorType]] = None,
runs: Optional[int] = 1,
) -> Experiment:
) -> SyncExperiment:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a breaking change for existing users? i guess you're saying in the release notes its not but why :)

Copy link
Contributor Author

@cdfox cdfox Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has a run method thats same as before

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess customers might import the experiment type? not sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was operating under the assumption the Experiment class itself is internal and people are meant to only get to it through the factory method (since it's that way in the docs). We have some validation in the factory that you don't get if you go direct.

"Cannot use SyncExperiment.run() from an async context. "
"Use 'await exp.run()' with LLMObs.async_experiment() instead."
)
return asyncio.run(self._experiment.run(jobs=jobs, raise_errors=raise_errors, sample_size=sample_size))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lets you run sync experiments parallel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it is just to re-use async code for the sync case, to reduce duplication

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but i guess my question is does parallel sync execution still work? i think it did before but maybe just for evals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the concurrency still works as it did before. There is concurrency both for running the task on records and for running evaluations (there was before and there is still is with this change).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in _run_task, all the _process_record() corountines run concurrently in the gather(). WIthin each coroutines, if it's a sync task/evaluator, it gets run on the default thread pool executor, which by default has a fairly large of max threads (I think it's like 5 * number of cpus). But on the async side, the asyncio.semaphore controls how many coroutines are active, so we rely on that mostly to control how much concurrency there is.

Copy link
Contributor

@gsvigruha gsvigruha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a relatively straightforward refactoring

@cdfox
Copy link
Contributor Author

cdfox commented Feb 19, 2026

@gsvigruha @Kyle-Verhoog I just created a more thorough test script to test out these changes locally: https://github.com/DataDog/llm-obs/blob/28ab7632b4bc6c40a1b44ade3022b6ebbc54d90f/demos/experiment_async/experiment_speed_test.py

It demonstrates sync and async experiments (with async task + a fast sync evaluator) and verifies there is speedup in both cases when jobs increased from 1 to 10. It also exercises the multi-run feature that was added in DnE recently, so we can see that working. Experiment run in the UI

Next I'm going to check on how this changes interacts with jupyter notebook, because I think there may be a risk there (it has running event loop already).

Screenshot 2026-02-19 at 12 03 12 PM Screenshot 2026-02-19 at 12 03 34 PM

@cdfox
Copy link
Contributor Author

cdfox commented Feb 19, 2026

@gsvigruha @Kyle-Verhoog Pushed one more change to resolve an issue with running sync experiment from jupyter. I used this test notebook to verify both sync and async experiments work (async test is at the bottom): https://github.com/DataDog/llm-obs/blob/b641a7facba3f69303ba4b2258b9154c3681a6ca/demos/experiment_async/async_notebook_test.ipynb

- |
LLM Observability: Consolidates the internal sync ``Experiment`` and ``AsyncExperiment`` classes into a single
async-native ``Experiment`` class with a thin ``SyncExperiment`` wrapper for synchronous callers. This is an
internal refactoring with no changes to the public API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal refactoring with no changes to the public API

do we need a release note then?

What does any of this mean for a customer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if release note is always required. Won't CI fail if I leave it out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add changelog/no-changelog label, and it'll skip the release note requirement check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments