Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inverted flame graph #439

Merged
merged 2 commits into from
Sep 5, 2023
Merged

Conversation

ivonastojanovic
Copy link

@ivonastojanovic ivonastojanovic commented Aug 14, 2023

Added inverted flame graph which is generated with --inverted command line argument. The tree's nodes with the same parent and location are merged when we check 'Hide Import System Frames' checkbox.

Fixes: #293

Describe your changes
A clear and concise description of the changes you have made.

Testing performed
Describe the testing you have performed to ensure that the bug has been addressed, or that the new feature works as planned.

Additional context
Add any other context about your contribution here.

@@ -75,6 +75,21 @@ def __init__(
self.data = data
self.memory_records = memory_records

@classmethod
def update_import_system_frames_for_inverted_flamegraph(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have two problems:

  • For deep trees, recursion can be quite slow.
  • In Python there is a recursion limit to protect from stack overflows. This means that if the tree is deep enough this may blow up with a RecursionError.

We could transform this function into something iterative that keeps the stack manually but on the other hand we are already limiting the stack depth here:

if index - num_skipped_frames > MAX_STACKS:
current_frame["name"] = "<STACK TOO DEEP>"
current_frame["location"] = ["...", "...", 0]
break

It may be good to add a test to see that we don't crash for a stack that's too deep. The test would just see what's the generated data for an input that's very nested.

@@ -178,6 +195,11 @@ def _from_any_snapshot(
for interval in intervals
)

if inverted:
frames[0][
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the assignment necessary? update_import_system_frames_for_inverted_flamegraph does this:

nodes[node_index]["import_system"] |= result

and on the first call node_index is 0 so it should update frames[0] already, no?

@@ -28,7 +28,9 @@ def include_file(name: str) -> Markup:
return env


def get_report_title(*, kind: str, show_memory_leaks: bool) -> str:
def get_report_title(*, kind: str, show_memory_leaks: bool, inverted: bool) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed some updates of calls to this function for example here: tests/unit/test_templates.py

node.value += same_node.value;
}

root.children = root.children.filter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hummmm. isn't this invalidating the array root.children and therefore making the indexing incorrect?

@@ -192,6 +192,32 @@ function filterFramesByFunc(root, func) {
return _.defaults({ children: children }, root);
}

function mergeNodes(root) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function will likely need some tests in src/memray/reporters/assets/common.test.js

@sarahmonod sarahmonod mentioned this pull request Aug 15, 2023
1 task
@ivonastojanovic ivonastojanovic force-pushed the inverted_flame_graph branch 2 times, most recently from 553d821 to 2abf102 Compare August 21, 2023 13:14
@codecov-commenter
Copy link

codecov-commenter commented Aug 21, 2023

Codecov Report

Patch coverage: 91.37% and project coverage change: +0.16% 🎉

Comparison is base (0154228) 91.72% compared to head (9037cba) 91.89%.
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #439      +/-   ##
==========================================
+ Coverage   91.72%   91.89%   +0.16%     
==========================================
  Files          90       90              
  Lines       10532    10766     +234     
  Branches     1450     1472      +22     
==========================================
+ Hits         9661     9893     +232     
- Misses        868      871       +3     
+ Partials        3        2       -1     
Flag Coverage Δ
cpp 85.23% <ø> (+0.02%) ⬆️
python_and_cython 95.30% <91.37%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
tests/unit/test_flamegraph_reporter.py 100.00% <ø> (ø)
src/memray/reporters/table.py 94.28% <50.00%> (-2.69%) ⬇️
src/memray/reporters/transform.py 96.29% <50.00%> (-1.79%) ⬇️
src/memray/commands/common.py 86.53% <80.00%> (-0.34%) ⬇️
src/memray/reporters/flamegraph.py 91.66% <93.20%> (+3.66%) ⬆️
src/memray/_ipython/flamegraph.py 95.45% <100.00%> (+0.06%) ⬆️
src/memray/commands/flamegraph.py 100.00% <100.00%> (ø)
tests/unit/test_templates.py 100.00% <100.00%> (ø)

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

from typing import TextIO
from typing import Tuple
from typing import TypedDict
Copy link
Contributor

@godlygeek godlygeek Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is failing because of this. We still support 3.7, and typing.TypedDict wasn't added until Python 3.8. We either need to use from typing_extensions import TypedDict on Python 3.7, or drop support for Python 3.7 - or maybe make this conditional on Python version... or maybe only use TypedDict when typing.TYPE_CHECKING is set.

Or maybe just use a dict for now, not worry about typing it since it's an internal API, and add an issue to remind us to add typing for it after we drop 3.7 support 😄

@@ -0,0 +1,4 @@
Add a new ``inverted`` flag to ``memray flamegraph``. In an inverted flame graph, the roots are the functions that allocated memory,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Add a new ``inverted`` flag to ``memray flamegraph``. In an inverted flame graph, the roots are the functions that allocated memory,
Add a new ``--inverted`` flag to ``memray flamegraph``. In an inverted flame graph, the roots are the functions that allocated memory,

@@ -0,0 +1,4 @@
Add a new ``inverted`` flag to ``memray flamegraph``. In an inverted flame graph, the roots are the functions that allocated memory,
and the children of any given node represent the percentage of any of that node's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and the children of any given node represent the percentage of any of that node's
and the children of any given node represent the percentage of that node's

@@ -0,0 +1,4 @@
Add a new ``inverted`` flag to ``memray flamegraph``. In an inverted flame graph, the roots are the functions that allocated memory,
and the children of any given node represent the percentage of any of that node's
allocations that can be attributed to a particular caller.The inverted flame graph is very helpful in analyzing where memory is being
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allocations that can be attributed to a particular caller.The inverted flame graph is very helpful in analyzing where memory is being
allocations that can be attributed to a particular caller. The inverted flame graph is very helpful in analyzing where memory is being

@@ -90,6 +90,13 @@ def argument_parser() -> argparse.ArgumentParser:
default=False,
)

parser.add_argument(
"--inverted",
help="Invert flame graph",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should explain this in more detail, I think. Something like:

Suggested change
help="Invert flame graph",
help=(
"Invert the flame graph: "
"use allocators as roots instead of thread entry points",
)

@@ -132,14 +178,15 @@ def _from_any_snapshot(
)
num_skipped_frames = 0
is_import_system = False
for index, stack_frame in enumerate(reversed(stack)):
stack_it = enumerate((stack)) if inverted else enumerate(reversed(stack))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
stack_it = enumerate((stack)) if inverted else enumerate(reversed(stack))
stack_it = enumerate(stack) if inverted else enumerate(reversed(stack))

result = cls.update_import_system_frames_for_inverted_flamegraph(
nodes, child_index
)
nodes[node_index]["import_system"] |= result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The |= here is essentially saying: if any of this node's callers are part of the import system, then this is an import system node. That doesn't seem correct to me. Consider a test case like this:

    def test_inverted_allocator_sometimes_called_under_importlib(self):
        # GIVEN
        peak_allocations = [
            MockAllocationRecord(
                tid=1,
                address=0x1000000,
                size=1024,
                allocator=AllocatorType.MALLOC,
                stack_id=1,
                n_allocations=1,
                _stack=[
                    ("malloc", "malloc.c", 12),
                    ("parent_one", "fun.py", 8),
                    ("main", "main.py", 4),
                ],
            ),
            MockAllocationRecord(
                tid=1,
                address=0x1000000,
                size=1024,
                allocator=AllocatorType.MALLOC,
                stack_id=1,
                n_allocations=1,
                _stack=[
                    ("malloc", "malloc.c", 12),
                    ("importer", "<frozen importlib>", 4),
                    ("parent_two", "fun.py", 10),
                    ("main", "main.py", 4),
                ],
            ),
        ]

In that example, malloc can be called from two different paths - either by main calling parent_one calling malloc, or by main calling parent_two calling importer calling malloc. importer is part of the import system, so there's one path to malloc that goes through the import system and one that doesn't. Each of those paths are responsible for half the bytes allocated by malloc.

Our non-inverted flame graph displays these stacks for that:

|                root size=2048               |
|                main size=2048               |
| parent_one size=1024 | parent_two size=1024 |
|   malloc size=1024   |  importer size=1024  |
|                      |   malloc size=1024   |

When you click the "hide import system frames" checkbox, importer and the frames below it are hidden, and parent_two becomes a leaf:

|                root size=2048               |
|                main size=2048               |
| parent_one size=1024 | parent_two size=1024 |
|   malloc size=1024   |                      |

For our inverted flame graphs, I expect the exact same stacks to be shown, just with leaves and roots swapped. So, when import system frames are not hidden, I expect us to show this:

|                root size=2048               |
|               malloc size=2048              |
| parent_one size=1024 |  importer size=1024  |
|    main size=1024    | parent_two size=1024 |
|                      |    main size=1024    |

But when import system frames are hidden, I expect it to instead show:

|                root size=2048               |
|   malloc size=1024   | parent_two size=1024 |
| parent_one size=1024 |    main size=1024    |
|    main size=1024    |                      |

The main -> parent_one -> malloc path remains unchanged by "hide import system frames", since that path doesn't go through the import system, but the main -> parent_two -> importer -> malloc stack gets truncated to main -> parent_two when importer is reached (just like it is for non-inverted flame graphs), and the stack we show now ends with parent_two as a child of the root (because parent_two is a leaf in the non-inverted hide-import-system case, and inverted mode swaps leaves and roots). Which means that the malloc node shrinks from 2048 bytes to 1024 bytes, since half of its allocations are attributable to a path through the import system, and half aren't.

There's no way we can achieve that desired behavior if we update the malloc node to say that it's an import system frame because one of its callers is an import system frame, even though the other isn't. As this function is currently implemented, I think it would result in us hiding malloc entirely, effectively truncating the main -> parent_one -> malloc call stack even though it didn't pass through the import system.

I can think of 3 reasonable ways to handle this:

  1. Build two separate trees in --inverted mode. For one, use full stacks as the input. For the other, truncate the stacks at import system calls before processing them into a tree. Include both trees in the inverted-mode HTML, and have the checkbox switch which one is rendered.
  2. In the FrameNodeDict, instead of recording just a bool for whether or not the node was reached by the import system, record an int: the number of bytes allocated at that node through a path that went through the import system. That would change this |= into a +=. When the checkbox is unset the inverted-mode JS code would ignore that field. When the checkbox is checked the JS code would shrink every node by treating that node's size as value - import_system_bytes, instead of as value bytes. Any node where value - import_system_bytes is 0 would be removed, and its children would be hoisted up to the root.
  3. Pre-process each stack to identify which frames are below a call into the import system. Include the is-import-system flag as part of the node key, so that we have two different malloc nodes in the dumped tree, each associated with 1024 bytes. When the checkbox is checked, our JS code drops nodes with is-import-system set, and hoists their children up a level. When the checkbox is unchecked, our JS code would need to traverse the tree and merge nodes together if they have the same parent and same location, but one is flagged is-import-system and the other isn't.

Of these options, I think 1 is probably the simplest, and 3 is probably the most complex. 2 might require the fewest changes to what's been implemented so far...

@pablogsal Please check what I've said here, and see if a) you agree with me about what the checkbox should do, and b) if my proposed solutions seem reasonable.

Comment on lines 34 to 35
if inverted and kind == "flamegraph":
kind = f"inverted {kind}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to return "inverted flamegraph report (memory leaks)" if show_memory_leaks is true? I think what we want is:

    parts = []
    if inverted:
        parts.append("inverted")
    parts.append(kind)
    parts.append("report")
    if show_memory_leaks:
        parts.append("(memory leaks)")
    return " ".join(parts)

Also: Don't worry about doing the and kind == "flamegraph" here. I think it's fine to return "inverted table report" if someone calls this with kind="table". We know they won't, but if they for some reason do, that would be sane behavior, so there's no reason to avoid it.

@godlygeek
Copy link
Contributor

As a heads up, I haven't looked at the JS or the tests yet, so the above comments are what I have to say about the Python side of the implementation.

Copy link
Contributor

@godlygeek godlygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! I've taken a look at the Javascript this time around (though I still haven't looked closely at the tests), and most of the concerns I have now are pretty minor.

@@ -20,16 +29,43 @@
from memray.reporters.frame_tools import is_cpython_internal
from memray.reporters.frame_tools import is_frame_from_import_system
from memray.reporters.frame_tools import is_frame_interesting
from memray.reporters.stats import PythonStackElement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not have one reporter importing things from another if we can avoid it. I'd rather copy-paste this type declaration, instead:

Suggested change
from memray.reporters.stats import PythonStackElement
PythonStackElement = Tuple[str, str, int]

That helps maintain the separation of responsibilities between different reporters. We might eventually want to move that definition into a common place that different reporters can share - maybe create src/memray/_types.py or something like that - but I wouldn't bother until we need it in yet another place.

Comment on lines 115 to 116
@classmethod
def update_import_system_frames_for_inverted_flamegraph(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is no longer used and should be removed

Comment on lines 217 to 210
if temporal and inverted:
raise NotImplementedError(
"FlamegraphReporter does not support temporal inverted flame graphs."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should handle this in src/memray/commands/common.py in run instead, and do a parser.error() if the user provides both --temporal and --inverted. That'll give a nicer error message for users, I believe.

frames: List[FrameNodeDict],
node_index_by_key: Dict[Tuple[int, StackFrame, str], int],
record: Union[AllocationRecord, TemporalAllocationRecord],
current_frame_id: int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callers always provide 0 for this argument. Since it doesn't vary from one call to the next, current_frame_id should be a local variable instead of a parameter.

current_frame["location"] = ("...", "...", 0)
break

return frames
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to return frames here. The caller passed frames to us, so they already have a reference to it. Returning it is misleading, since it isn't clear from the signature that we always return the same list, never a different one.

Comment on lines 301 to 308
import_system_stack = list(
reversed(
list(
itertools.takewhile(
lambda e: not is_frame_from_import_system(e),
reversed(stack),
)
)
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! Very elegant.

"value": 0,
"children": [],
"n_allocations": 0,
"thread_id": "0x0",
"interesting": True,
"import_system": False,
}
inverted_import_system_root = deepcopy(root) if inverted else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think every variable name in this function with _import_system_ in the name should be renamed to _no_import_system_ (or just _no_imports_) instead.

inverted_import_system_root makes it sound like this is the root of an inverted tree that does contain import system nodes, when actually its the root of an inverted tree that contains only non-import-system nodes.

Comment on lines 217 to 218
if temporal and inverted:
raise NotImplementedError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you given any thought to what it would take to extend this to work for temporal mode? It should definitely be a separate PR, this one is long enough - but I don't think it would be too tough to implement.

@@ -139,8 +139,11 @@ <h5 class="modal-title" id="helpModalLabel">How to interpret {{ kind }} reports<
<script type="text/javascript">
const packed_data = {{ data|tojson }};
var data = null;
var flamegraphData = null;
var invertedImportSystemData = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this makes it sound like this one does include import system nodes, rather than does not. I'd call it something like importSystemHiddenFlamegraphData or noImportSystemData or flamegraphDataNoImportSystem or something.

nodes: inverted_import_system_nodes,
unique_threads: unique_threads,
})
: {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do null here instead of an empty object. That's a better indicator for the particular sort of invalid that this is.

Suggested change
: {};
: null;

Copy link
Contributor

@godlygeek godlygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a few small changes, but this LGTM!

Comment on lines 42 to 44
size: Union[int, None]
n_allocations: Union[int, None]
intervals: Union[List[Interval], None]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union[X, None] is the same thing as Optional[X], and we're already using Optional elsewhere in this file, so I'm going to update these 3 to match.

cls,
frames: List[FrameNodeDict],
all_strings: StringRegistry,
interval_list: List[Tuple[int, Union[int, None], int, int, int]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function doesn't actually touch interval_list beyond seeing whether it exists, and since it exists if and only if we're in temporal mode, I'm going to switch this to just accepting temporal: bool instead.

inverted_no_imports_frames: List[FrameNodeDict] = []

if inverted:
inverted_no_imports_root = deepcopy(root)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like having root and inverted_no_imports_root as local variables in this function, since they're only used here at the very top of the fairly big function. Limiting the amount of local state will make this function easier to understand. I'm going to factor creation of the root node out into a function, and call it once for each tree we're making.

Comment on lines 242 to 245
node_index_by_key: Dict[Tuple[int, StackFrame, str], int] = {}
inverted_no_imports_node_index_by_key: Dict[
Tuple[int, StackFrame, str], int
] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna switch this to use a type alias, to make things slightly easier to read:

        NodeKey = Tuple[int, StackFrame, str]
        node_index_by_key: Dict[NodeKey, int] = {}
        inverted_no_imports_node_index_by_key: Dict[NodeKey, int] = {}

In particular, it's tough to see that the two variables have the same type given the way that Black wrapped the lines, and this change makes that more recognizable.

Comment on lines 264 to 269
record_data: RecordData = {
"thread_name": thread_id,
"size": size,
"n_allocations": n_allocations,
"intervals": intervals,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna move this to be defined in each branch of the if above, so that we don't need intervals and size and n_allocations and thread_id as local variables in addition to values in this dict, for the same reason as above - limiting the number of local variables here will help future readers.

Comment on lines 279 to 281
stack_it = enumerate(reversed(stack))
cls.generate_frames(
stack_it=stack_it,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna drop this local variable as well, and just construct it in the call to the function below, since it's only used in one spot, and to keep limiting local state.

Comment on lines 299 to 308
no_imports_stack = list(
reversed(
list(
itertools.takewhile(
lambda e: not is_frame_from_import_system(e),
reversed(stack),
)
)
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna split this into a function, so that we can assign a nice name to it for the benefit of future maintainers.

Added inverted flame graph which is generated with --inverted command
line argument.

Signed-off-by: Ivona Stojanovic <[email protected]>
Added tests for inverted flame graph.

Signed-off-by: Ivona Stojanovic <[email protected]>
@godlygeek godlygeek enabled auto-merge (rebase) September 5, 2023 23:12
@godlygeek godlygeek merged commit 68a453a into bloomberg:main Sep 5, 2023
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inverted flame graphs
4 participants