Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling #787

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open

Profiling #787

wants to merge 28 commits into from

Conversation

coreyjadams
Copy link
Collaborator

@coreyjadams coreyjadams commented Feb 7, 2025

Modulus Pull Request

Description

This pull request adds a utility to modulus, modulus.utils.profiling, which serves as a single point of entry for a few common profiling tools. This PR will also add tutorials on using these hooks, running and analyzing profilers, and showing how to extend the hooks to add custom user-specific tools. A high level summary of the additions:

  • modulus.utils.profiling.core.ProfileRegistry is singleton factory to let users conveniently access profiler instances from their code, wherever the call from.
  • modulus.utils.profiling.core.ModulusProfilerWrapper is a base class for profiler tools. Several common tools inherit this (like the torch profiler) and users can inherit and extend it.
  • modulus.utils.profiling.interface.Profiler is a singleton, one-stop-shop for access the configured profiling utilities. Let's users apply profiling hooks and annotations, but also lets users modify default profiler configurations when necessary.

The profiler interface works like this: Users can instrument their code anywhere, anytime:

from modulus.utils.profiling import profile, annotate

@profile
def important_function():
   ...

@annotate(color="blue")
def other_function():
   ...

annotate is always a pointer to nvtx.annotate if it's installed, otherwise it's a null op. profile is a rename of Profiler().__call__, which is a (delayed) function decorator. The delay is because model code may often be imported and evaluated by the interpetter before the Profiler is fully initialized. To prevent the profiler from failing in these cases, the function signatures it wraps are captured and the decoration is replayed right after the profiler is initialized.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.

Dependencies

coreyjadams and others added 19 commits December 17, 2024 14:05
…ger. Still several TBD Objects but this implementation will capture a torch profile.
…learly separated as well as enable easier extensions.
…ler, for now, and the most (only?) useful annotation tool
Some minor updtes to the tools themselves to accomodate instance clearing and refreshing.
@ktangsali
Copy link
Collaborator

/multi-gpu-ci

@coreyjadams
Copy link
Collaborator Author

FYI - @ktangsali this PR has no additional multi gpu tests, actually! Just single device here.

@coreyjadams coreyjadams added the ! - Release PRs or Issues releating to a release label Feb 11, 2025
@coreyjadams
Copy link
Collaborator Author

/blossom-ci

@coreyjadams
Copy link
Collaborator Author

@ktangsali can you trigger the CI on this branch? I checked that it passes locally and I thought my authorization went through, but it did not yet.

@ktangsali
Copy link
Collaborator

/blossom-ci

Remove nvtx wrapper
@ktangsali
Copy link
Collaborator

/blossom-ci

@coreyjadams
Copy link
Collaborator Author

/blossom-ci

@coreyjadams
Copy link
Collaborator Author

/blossom-ci

1 similar comment
@coreyjadams
Copy link
Collaborator Author

/blossom-ci

@coreyjadams
Copy link
Collaborator Author

/blossom-ci

Copy link
Collaborator

@aleckohlhoff aleckohlhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good. Many of my comments address stylistic changes more than functional changes.

Comment on lines +64 to +157
@property
def enabled(self) -> bool:
"""Get whether the profiler is enabled.

Returns:
bool: True if profiler is enabled, False otherwise
"""
return self._enabled

@enabled.setter
def enabled(self, value: bool) -> None:
"""Set whether the profiler is enabled.

Args:
value (bool): True to enable profiler, False to disable
"""
if not isinstance(value, bool):
raise TypeError("enabled must be a boolean value")
self._enabled = value

@property
def finalized(self) -> bool:
"""Get whether the profiler has been finalized.

Returns:
bool: True if profiler is finalized, False otherwise
"""
return self._finalized

@finalized.setter
def finalized(self, value: bool) -> None:
"""Set whether the profiler has been finalized.

Args:
value (bool): True to mark as finalized, False otherwise
"""
if not isinstance(value, bool):
raise TypeError("finalized must be a boolean value")
self._finalized = value

@property
def initialized(self) -> bool:
"""Get whether the profiler has been initialized.

Returns:
bool: True if profiler is initialized, False otherwise
"""
return self._initialized

@initialized.setter
def initialized(self, value: bool) -> None:
"""Set whether the profiler has been initialized.

Args:
value (bool): True to mark as initialized, False otherwise
"""
if not isinstance(value, bool):
raise TypeError("initialized must be a boolean value")
self._initialized = value

@property
def is_decorator(self):
"""
Flag to declare if this profiling instance supports function decoration
"""
return self._is_decorator

@is_decorator.setter
def is_decorator(self, value: bool) -> None:
"""Set whether the profiler supports function decoration.

Args:
value (bool): True to support function decoration, False otherwise
"""
if not isinstance(value, bool):
raise TypeError("is_decorator must be a boolean value")

@property
def is_context(self):
"""
Flag to declare if this profiling instance supports context-based profiling
"""
return self._is_context

@is_context.setter
def is_context(self, value: bool) -> None:
"""Set whether the profiler supports context-based profiling.

Args:
value (bool): True to support context-based profiling, False otherwise
"""
if not isinstance(value, bool):
raise TypeError("is_context must be a boolean value")
self._is_context = value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are properties (each with their own setters and getters) needed? I would find it simpler to leave them as attributes and document each one in the class docstring.

Comment on lines +159 to +164
def enable(self) -> None:
"""Enable the profiler.

Sets the internal enabled flag to True to activate profiling.
"""
self._enabled = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an internal flag, should we prefix it with an underscore (i.e., _enable)? Also, similar to the above comment, do we need a helper method for this—especially since it is an internal flag?

Comment on lines +264 to +269
if cls not in cls._instances:
with cls._lock:
# Double-checked locking pattern
if cls not in cls._instances:
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the GIL will effectively make this thread-safe, I wouldn't rely on this—especially with the possibility of a free-threaded mode in the future. I would wrap the entire section with the lock.

Comment on lines +273 to +274
if cls in cls._instances:
del cls._instances[cls]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this is mainly for testing purposes, we should hold the lock for this method.

return instance

else:
raise Exception(f"ProfilerRegistry has no profiler under the key {key}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use a KeyError rather than a generic exception.

if key in cls._instances:
# Find instance of matching type in list
for instance in cls._instances:
if instance == key:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing classes should use is (see E721).

# writing outputs, etc. Only want to trigger this once
_finalized: bool

exit_stack = ExitStack()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended that exit_stack be shared with all instances? This would mean that on any context exit would exit all profiler contexts.

_is_decorator: Whether this profiler supports decorator usage
"""

__metaclass__ = _Profiler_Singleton
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the keyword argument for the class definition like LineProfileWrapper?

Comment on lines +128 to +137
if not self.enabled:
return

# Avoid finalizing if we never initialized:
if not self.initialized:
return

# Prevent double finalization:
if self.finalized:
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably merge this into one if condition for brevity.

Comment on lines +64 to +122
@property
def enabled(self) -> bool:
"""Get whether the profiler is enabled.

Returns:
bool: True if profiler is enabled, False otherwise
"""
return self._enabled

@enabled.setter
def enabled(self, value: bool) -> None:
"""Set whether the profiler is enabled.

Args:
value (bool): True to enable profiler, False to disable
"""
if not isinstance(value, bool):
raise TypeError("enabled must be a boolean value")
self._enabled = value

@property
def finalized(self) -> bool:
"""Get whether the profiler has been finalized.

Returns:
bool: True if profiler is finalized, False otherwise
"""
return self._finalized

@finalized.setter
def finalized(self, value: bool) -> None:
"""Set whether the profiler has been finalized.

Args:
value (bool): True to mark as finalized, False otherwise
"""
if not isinstance(value, bool):
raise TypeError("finalized must be a boolean value")
self._finalized = value

@property
def initialized(self) -> bool:
"""Get whether the profiler has been initialized.

Returns:
bool: True if profiler is initialized, False otherwise
"""
return self._initialized

@initialized.setter
def initialized(self, value: bool) -> None:
"""Set whether the profiler has been initialized.

Args:
value (bool): True to mark as initialized, False otherwise
"""
if not isinstance(value, bool):
raise TypeError("initialized must be a boolean value")
self._initialized = value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the lifecycle of a profiler always enabledinitializedfinalized? Perhaps it would be simpler to have a single state variable of type Literal["enabled", "initialized", "finalized"] such that we cannot accidentally have an erroneous state.

@ktangsali
Copy link
Collaborator

Reminder to change the target to 0.10.0-rc branch before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
! - Release PRs or Issues releating to a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants