Profiling #787

coreyjadams · 2025-02-07T02:49:24Z

Modulus Pull Request

Description

This pull request adds a utility to modulus, modulus.utils.profiling, which serves as a single point of entry for a few common profiling tools. This PR will also add tutorials on using these hooks, running and analyzing profilers, and showing how to extend the hooks to add custom user-specific tools. A high level summary of the additions:

modulus.utils.profiling.core.ProfileRegistry is singleton factory to let users conveniently access profiler instances from their code, wherever the call from.
modulus.utils.profiling.core.ModulusProfilerWrapper is a base class for profiler tools. Several common tools inherit this (like the torch profiler) and users can inherit and extend it.
modulus.utils.profiling.interface.Profiler is a singleton, one-stop-shop for access the configured profiling utilities. Let's users apply profiling hooks and annotations, but also lets users modify default profiler configurations when necessary.

The profiler interface works like this: Users can instrument their code anywhere, anytime:

from modulus.utils.profiling import profile, annotate

@profile
def important_function():
   ...

@annotate(color="blue")
def other_function():
   ...

annotate is always a pointer to nvtx.annotate if it's installed, otherwise it's a null op. profile is a rename of Profiler().__call__, which is a (delayed) function decorator. The delay is because model code may often be imported and evaluated by the interpetter before the Profiler is fully initialized. To prevent the profiler from failing in these cases, the function signatures it wraps are captured and the decoration is replayed right after the profiler is initialized.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.

Dependencies

… this time

…ger. Still several TBD Objects but this implementation will capture a torch profile.

…learly separated as well as enable easier extensions.

…a crash that I haven't resolved yet.

…ing tools

…ler, for now, and the most (only?) useful annotation tool

…filing

…add images.

…ode slightly. Ready for draft PR

Some minor updtes to the tools themselves to accomodate instance clearing and refreshing.

ktangsali · 2025-02-07T17:49:07Z

/multi-gpu-ci

coreyjadams · 2025-02-07T17:54:16Z

FYI - @ktangsali this PR has no additional multi gpu tests, actually! Just single device here.

coreyjadams · 2025-02-13T16:25:35Z

/blossom-ci

coreyjadams · 2025-02-13T16:36:02Z

@ktangsali can you trigger the CI on this branch? I checked that it passes locally and I thought my authorization went through, but it did not yet.

ktangsali · 2025-02-13T17:00:17Z

/blossom-ci

Remove nvtx wrapper

ktangsali · 2025-02-13T18:15:56Z

/blossom-ci

coreyjadams · 2025-02-13T18:54:34Z

/blossom-ci

…native layer norm.

coreyjadams · 2025-02-13T19:57:00Z

/blossom-ci

coreyjadams · 2025-02-13T20:34:33Z

/blossom-ci

…the test, I think, about TELayerNorm

coreyjadams · 2025-02-13T21:13:43Z

/blossom-ci

aleckohlhoff

Overall, it looks good. Many of my comments address stylistic changes more than functional changes.

aleckohlhoff · 2025-02-13T22:07:57Z

modulus/utils/profiling/core.py

+    @property
+    def enabled(self) -> bool:
+        """Get whether the profiler is enabled.
+
+        Returns:
+            bool: True if profiler is enabled, False otherwise
+        """
+        return self._enabled
+
+    @enabled.setter
+    def enabled(self, value: bool) -> None:
+        """Set whether the profiler is enabled.
+
+        Args:
+            value (bool): True to enable profiler, False to disable
+        """
+        if not isinstance(value, bool):
+            raise TypeError("enabled must be a boolean value")
+        self._enabled = value
+
+    @property
+    def finalized(self) -> bool:
+        """Get whether the profiler has been finalized.
+
+        Returns:
+            bool: True if profiler is finalized, False otherwise
+        """
+        return self._finalized
+
+    @finalized.setter
+    def finalized(self, value: bool) -> None:
+        """Set whether the profiler has been finalized.
+
+        Args:
+            value (bool): True to mark as finalized, False otherwise
+        """
+        if not isinstance(value, bool):
+            raise TypeError("finalized must be a boolean value")
+        self._finalized = value
+
+    @property
+    def initialized(self) -> bool:
+        """Get whether the profiler has been initialized.
+
+        Returns:
+            bool: True if profiler is initialized, False otherwise
+        """
+        return self._initialized
+
+    @initialized.setter
+    def initialized(self, value: bool) -> None:
+        """Set whether the profiler has been initialized.
+
+        Args:
+            value (bool): True to mark as initialized, False otherwise
+        """
+        if not isinstance(value, bool):
+            raise TypeError("initialized must be a boolean value")
+        self._initialized = value
+
+    @property
+    def is_decorator(self):
+        """
+        Flag to declare if this profiling instance supports function decoration
+        """
+        return self._is_decorator
+
+    @is_decorator.setter
+    def is_decorator(self, value: bool) -> None:
+        """Set whether the profiler supports function decoration.
+
+        Args:
+            value (bool): True to support function decoration, False otherwise
+        """
+        if not isinstance(value, bool):
+            raise TypeError("is_decorator must be a boolean value")
+
+    @property
+    def is_context(self):
+        """
+        Flag to declare if this profiling instance supports context-based profiling
+        """
+        return self._is_context
+
+    @is_context.setter
+    def is_context(self, value: bool) -> None:
+        """Set whether the profiler supports context-based profiling.
+
+        Args:
+            value (bool): True to support context-based profiling, False otherwise
+        """
+        if not isinstance(value, bool):
+            raise TypeError("is_context must be a boolean value")
+        self._is_context = value


Are properties (each with their own setters and getters) needed? I would find it simpler to leave them as attributes and document each one in the class docstring.

aleckohlhoff · 2025-02-13T22:16:25Z

modulus/utils/profiling/core.py

+    def enable(self) -> None:
+        """Enable the profiler.
+
+        Sets the internal enabled flag to True to activate profiling.
+        """
+        self._enabled = True


If this is an internal flag, should we prefix it with an underscore (i.e., _enable)? Also, similar to the above comment, do we need a helper method for this—especially since it is an internal flag?

aleckohlhoff · 2025-02-13T22:21:07Z

modulus/utils/profiling/core.py

+        if cls not in cls._instances:
+            with cls._lock:
+                # Double-checked locking pattern
+                if cls not in cls._instances:
+                    cls._instances[cls] = super().__call__(*args, **kwargs)
+        return cls._instances[cls]


While the GIL will effectively make this thread-safe, I wouldn't rely on this—especially with the possibility of a free-threaded mode in the future. I would wrap the entire section with the lock.

aleckohlhoff · 2025-02-13T22:21:49Z

modulus/utils/profiling/core.py

+        if cls in cls._instances:
+            del cls._instances[cls]


Even though this is mainly for testing purposes, we should hold the lock for this method.

aleckohlhoff · 2025-02-13T22:30:45Z

modulus/utils/profiling/core.py

+                    return instance
+
+        else:
+            raise Exception(f"ProfilerRegistry has no profiler under the key {key}")


Should use a KeyError rather than a generic exception.

aleckohlhoff · 2025-02-13T22:46:37Z

modulus/utils/profiling/core.py

+        if key in cls._instances:
+            # Find instance of matching type in list
+            for instance in cls._instances:
+                if instance == key:


Comparing classes should use is (see E721).

aleckohlhoff · 2025-02-13T22:51:17Z

modulus/utils/profiling/interface.py

+    # writing outputs, etc.  Only want to trigger this once
+    _finalized: bool
+
+    exit_stack = ExitStack()


Is it intended that exit_stack be shared with all instances? This would mean that on any context exit would exit all profiler contexts.

aleckohlhoff · 2025-02-13T22:54:29Z

modulus/utils/profiling/torch.py

+        _is_decorator: Whether this profiler supports decorator usage
+    """
+
+    __metaclass__ = _Profiler_Singleton


Why not use the keyword argument for the class definition like LineProfileWrapper?

aleckohlhoff · 2025-02-13T22:55:55Z

modulus/utils/profiling/torch.py

+        if not self.enabled:
+            return
+
+        # Avoid finalizing if we never initialized:
+        if not self.initialized:
+            return
+
+        # Prevent double finalization:
+        if self.finalized:
+            return


We can probably merge this into one if condition for brevity.

aleckohlhoff · 2025-02-13T23:06:19Z

modulus/utils/profiling/core.py

+    @property
+    def enabled(self) -> bool:
+        """Get whether the profiler is enabled.
+
+        Returns:
+            bool: True if profiler is enabled, False otherwise
+        """
+        return self._enabled
+
+    @enabled.setter
+    def enabled(self, value: bool) -> None:
+        """Set whether the profiler is enabled.
+
+        Args:
+            value (bool): True to enable profiler, False to disable
+        """
+        if not isinstance(value, bool):
+            raise TypeError("enabled must be a boolean value")
+        self._enabled = value
+
+    @property
+    def finalized(self) -> bool:
+        """Get whether the profiler has been finalized.
+
+        Returns:
+            bool: True if profiler is finalized, False otherwise
+        """
+        return self._finalized
+
+    @finalized.setter
+    def finalized(self, value: bool) -> None:
+        """Set whether the profiler has been finalized.
+
+        Args:
+            value (bool): True to mark as finalized, False otherwise
+        """
+        if not isinstance(value, bool):
+            raise TypeError("finalized must be a boolean value")
+        self._finalized = value
+
+    @property
+    def initialized(self) -> bool:
+        """Get whether the profiler has been initialized.
+
+        Returns:
+            bool: True if profiler is initialized, False otherwise
+        """
+        return self._initialized
+
+    @initialized.setter
+    def initialized(self, value: bool) -> None:
+        """Set whether the profiler has been initialized.
+
+        Args:
+            value (bool): True to mark as initialized, False otherwise
+        """
+        if not isinstance(value, bool):
+            raise TypeError("initialized must be a boolean value")
+        self._initialized = value


Is the lifecycle of a profiler always enabled → initialized → finalized? Perhaps it would be simpler to have a single state variable of type Literal["enabled", "initialized", "finalized"] such that we cannot accidentally have an erroneous state.

ktangsali · 2025-02-14T17:20:04Z

Reminder to change the target to 0.10.0-rc branch before merging.

coreyjadams and others added 19 commits December 17, 2024 14:05

Stashing profiling work

41fde8d

Torch profile works but is very slow. line profiler not functional at…

51302b3

… this time

Enablement of profiling tool with pytorch profiler, as a context mana…

990f053

…ger. Still several TBD Objects but this implementation will capture a torch profile.

Moving profiling tools into a directory to make separate tools more c…

2f59946

…learly separated as well as enable easier extensions.

Profiling tools work with torch profiler and line_profiler. nsys has …

507d1a8

…a crash that I haven't resolved yet.

Fix line profiling construction

e9547df

Begin instrumenting figconvnet and adding tutorials on modulus profil…

a71d3ab

…ing tools

Merge branch 'NVIDIA:main' into profiling

eb8bed0

Remove annotations and force all annotations to conform to nvtx. Simp…

9cbd4ff

…ler, for now, and the most (only?) useful annotation tool

Updating profiling tutorial

6832147

Minor updates to profiling interfaces

4febcc9

only adding some profiling hooks to figconvnet

7c900fc

Merge branch 'NVIDIA:main' into profiling

7dbb56b

Merge remote-tracking branch 'refs/remotes/origin/profiling' into pro…

e7b2dfe

…filing

Add profiling hooks to mesh graph net.

8963a5e

Set TELayerNorm to default layer norm in MeshGraphNet

5d0b9dd

Merge branch 'NVIDIA:main' into profiling

5099dbb

Nearly finished profiling tutorial and tooling example. Just need to …

db5bb09

…add images.

Final (first) draft of the profiling tutorial and clean up profiler c…

eea3e70

…ode slightly. Ready for draft PR

coreyjadams requested a review from aleckohlhoff February 7, 2025 02:49

coreyjadams added 2 commits February 7, 2025 10:01

Add tests to the profiler tools to check functionality. Thanks Cursor!

29d51e3

Some minor updtes to the tools themselves to accomodate instance clearing and refreshing.

Update changelog for profiling tools

387c307

coreyjadams added the ! - Release PRs or Issues releating to a release label Feb 11, 2025

coreyjadams added 3 commits February 13, 2025 09:27

Merge remote-tracking branch 'upstream/main' into profiling

7e7b9dd

Update profiler files to (hopefully) pass CI checks

1d48484

Remove profiling parts from capture.py for later integration

a70846e

Update __init__.py

e24009c

Remove nvtx wrapper

Add extra line to make linting happy...

1d5ba30

When cuda is not available (mostly CI), emit a warning and switch to …

e550270

…native layer norm.

Make the default as LayerNorm so tests will pass. Needs more care in …

79347ce

…the test, I think, about TELayerNorm

coreyjadams mentioned this pull request Feb 13, 2025

🐛[BUG]: MeshGraphNet tests fail if default norm is TELayerNorm #794

Open

aleckohlhoff reviewed Feb 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiling #787

Profiling #787

coreyjadams commented Feb 7, 2025 •

edited

Loading

ktangsali commented Feb 7, 2025

coreyjadams commented Feb 7, 2025

coreyjadams commented Feb 13, 2025

coreyjadams commented Feb 13, 2025

ktangsali commented Feb 13, 2025

ktangsali commented Feb 13, 2025

coreyjadams commented Feb 13, 2025

coreyjadams commented Feb 13, 2025

coreyjadams commented Feb 13, 2025

coreyjadams commented Feb 13, 2025

aleckohlhoff left a comment

aleckohlhoff Feb 13, 2025

aleckohlhoff Feb 13, 2025

aleckohlhoff Feb 13, 2025

aleckohlhoff Feb 13, 2025

aleckohlhoff Feb 13, 2025

aleckohlhoff Feb 13, 2025

aleckohlhoff Feb 13, 2025

aleckohlhoff Feb 13, 2025

aleckohlhoff Feb 13, 2025

aleckohlhoff Feb 13, 2025

ktangsali commented Feb 14, 2025

Profiling #787

Are you sure you want to change the base?

Profiling #787

Conversation

coreyjadams commented Feb 7, 2025 • edited Loading

Modulus Pull Request

Description

Checklist

Dependencies

ktangsali commented Feb 7, 2025

coreyjadams commented Feb 7, 2025

coreyjadams commented Feb 13, 2025

coreyjadams commented Feb 13, 2025

ktangsali commented Feb 13, 2025

ktangsali commented Feb 13, 2025

coreyjadams commented Feb 13, 2025

coreyjadams commented Feb 13, 2025

coreyjadams commented Feb 13, 2025

coreyjadams commented Feb 13, 2025

aleckohlhoff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktangsali commented Feb 14, 2025

coreyjadams commented Feb 7, 2025 •

edited

Loading