Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(profiling): add support for pytorch profiling #9154

Merged
merged 114 commits into from
Dec 13, 2024

Conversation

sanchda
Copy link
Contributor

@sanchda sanchda commented May 3, 2024

PR does

  • Patches torch.profiler.profile class by adding our own on_trace_ready handler
  • Adds GPU time/flops/memory samples via libdatadog interface in on_trace_ready event handler
  • Ensures that libdd exporter is enabled if pytorch is enabled
  • Hides functionality behind a FF set to False by default
  • changelog entry
  • Is there a minimum python version?
  • Some documentation on needed user configuration, conflicting features, gotchas

Probably should make experimental/beta collectors not part of the ALL template (Is this blocking since we haven't done in the past??)

Testing Done

  • Tested by running on ec2 GPU instance
  • Tested by running prof-pytorch service in staging
  • I'm not entirely sure if we need unit tests for this feature, or where they would live. Would we want the unit test suite to depend on torch? Maybe this is solved for tracing integrations, though

Checklist

  • Change(s) are motivated and described in the PR description
  • Testing strategy is described if automated tests are not included in the PR
  • Risks are described (performance impact, potential for breakage, maintainability)
  • Change is maintainable (easy to change, telemetry, documentation)
  • Library release note guidelines are followed or label changelog/no-changelog is set
  • Documentation is included (in-code, generated user docs, public corp docs)
  • Backport labels are set (if applicable)
  • If this PR changes the public interface, I've notified @DataDog/apm-tees.

Reviewer Checklist

  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Description motivates each change
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Change is maintainable (easy to change, telemetry, documentation)
  • Release note makes sense to a user of the library
  • Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@pr-commenter
Copy link

pr-commenter bot commented May 21, 2024

Benchmarks

Benchmark execution time: 2024-12-13 22:37:43

Comparing candidate commit 9393216 in PR branch peterg17/pytorch_profiling_integration2 with baseline commit 1dd528c in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 394 metrics, 2 unstable metrics.

Copy link
Contributor

@peterg17 peterg17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

ddtrace/profiling/collector/pytorch.py Show resolved Hide resolved
@taegyunkim taegyunkim self-requested a review July 9, 2024 17:01
@datadog-dd-trace-py-rkomorn
Copy link

datadog-dd-trace-py-rkomorn bot commented Jul 9, 2024

Datadog Report

Branch report: peterg17/pytorch_profiling_integration2
Commit report: 28ed224
Test service: dd-trace-py

✅ 0 Failed, 389 Passed, 1219 Skipped, 44m 19.55s Total duration (42m 52.03s time saved)

@danielsn danielsn enabled auto-merge (squash) December 13, 2024 21:23
@brettlangdon brettlangdon removed the request for review from taegyunkim December 13, 2024 21:33
@danielsn danielsn merged commit 00ec1f7 into main Dec 13, 2024
679 checks passed
@danielsn danielsn deleted the peterg17/pytorch_profiling_integration2 branch December 13, 2024 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants