Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Navigation memory leak: Appearing event not being unsubscribed properly #24507

Closed
wants to merge 6 commits into from

Conversation

mattsetaro
Copy link

@mattsetaro mattsetaro commented Aug 29, 2024

Description of Change

Capture the event being subscribed to Appearing in OnAppearing(Action) into a file-scoped field, to be unsubscribed now in SendDisappearing().

The OnAppearing(Action) call in Page.cs was subscribing an event to the Appearing EventHandler, and not unsubscribing when navigating away, leading to memory leaks. Note the _invocationList count in the screenshot:

Screenshot 2024-08-28 at 10 26 23 AM

I also have gcdump's of a File>New MAUI application with two pages that navigate to each other automatically, reproducing the issue. Source code and gcdumps here: https://github.com/mattsetaro/MauiMemoryLeak

I was also able to reproduce the memory leak with a unit test, here is the unit test and a screenshot of the memory snapshot from dotMemory (Object[] from screenshot is _invocationList on the Appearing event):
Screenshot 2024-08-28 at 10 17 20 AM

[Fact]
public async void MemoryLeak()
{
    Routing.RegisterRoute("page1/page2", typeof(DummyPage));
    Routing.RegisterRoute("page1/page2/page3", typeof(DumberPage));
    
    var shell = new TestShell(
    CreateShellItem(shellSectionRoute: "page1")
    );
    
    int ctr = 0;
    while (true)
    {
        await shell.GoToAsync("page1/page2/page3");
        Assert.True(shell.CurrentPage is DumberPage);
        
        await shell.GoToAsync("..");
        Assert.True(shell.CurrentPage is DummyPage);
        
        if (++ctr % 1000 == 0)
        {
	    _testOutputHelper.WriteLine("Total memory: " + GC.GetTotalMemory(false).ToString("N0"));
        }
    }
}

class DummyPage : Page
{
}

class DumberPage : Page
{
}

Issues Fixed

@mattsetaro mattsetaro requested a review from a team as a code owner August 29, 2024 13:36
@dotnet-policy-service dotnet-policy-service bot added the community ✨ Community Contribution label Aug 29, 2024
@mattsetaro
Copy link
Author

mattsetaro commented Aug 29, 2024 via email

@PureWeen
Copy link
Member

/azp run

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@rmarinho rmarinho requested review from jonathanpeppers and PureWeen and removed request for Eilon August 29, 2024 15:57
src/Controls/src/Core/Page/Page.cs Outdated Show resolved Hide resolved
src/Controls/src/Core/Page/Page.cs Show resolved Hide resolved
src/Controls/src/Core/Page/Page.cs Show resolved Hide resolved
@PureWeen
Copy link
Member

/azp run

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Comment on lines +583 to +584
[Fact]
public void SendDisappearingUnsubscribesAppearingEvent()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new test fails:

image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was failing because the counter was not being reset, but I discovered it was not really reproducing the issue correctly so I will look into that today

@samhouts samhouts added the area-navigation NavigationPage label Aug 29, 2024
Copy link
Member

@mattleibow mattleibow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PureWeen is this fix a good thing, or are we hiding a more sinister issue that should never happen?

Otherwise, this looks like a good fix to a possible real-world scenario?

Comment on lines +590 to +602
// This subscribes the event if the page has not appeared already
int invokeCounter = 0;
navPage.OnAppearing(() => { invokeCounter++; });
((IPageController)navPage).SendAppearing();
Assert.Equal(1, invokeCounter);

// Navigate away
((IPageController)navPage).SendDisappearing();

// This should not increment the counter when navigating back
navPage.OnAppearing(() => { invokeCounter++; });
((IPageController)navPage).SendAppearing();
Assert.Equal(1, invokeCounter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may work better:

Suggested change
// This subscribes the event if the page has not appeared already
int invokeCounter = 0;
navPage.OnAppearing(() => { invokeCounter++; });
((IPageController)navPage).SendAppearing();
Assert.Equal(1, invokeCounter);
// Navigate away
((IPageController)navPage).SendDisappearing();
// This should not increment the counter when navigating back
navPage.OnAppearing(() => { invokeCounter++; });
((IPageController)navPage).SendAppearing();
Assert.Equal(1, invokeCounter);
// This subscribes the event if the page has not appeared already
page.OnAppearing(() => Assert.Fail("Previous OnAppearing event code ran when it should have been removed."));
// Navigate away BEFORE a navigation TO completes
((IPageController)navPage).SendDisappearing();
// This should not increment the counter when navigating back
((IPageController)navPage).SendAppearing();

But it still does not work and I am going to bed. :)

But basically, you are calling SendAppearing which will always trigger and then unsub the event, so maybe you should skip the first SendAppearing to emulate? Or maybe this issue is a bit different? Might need to finish my nap before I fully understand this issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I was seeing something similar on my end too, will look into it today, thanks I like that Assert.Fail()

Comment on lines +609 to 610
EventHandler _onAppearingEventHandler;
internal void OnAppearing(Action action)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PureWeen not sure if it is ever the case where multiple callers will invoke this method before it appears and thus we need some sort of list of actions to run once it appears?

Maybe some list of actions is better anyway and a single event handler that just runs all the actions. And the disappearing just clears the list? Like a list of pending onappearing actions? For example, we have the _pendingActions list for the alerts and popups, so maybe we need to have a list of things that run once the page appears?

@mattsetaro
Copy link
Author

@PureWeen @mattleibow @jonathanpeppers I have an interesting development on this PR. In trying to write a unit test that reproduces the issue not using an infinite loop, I discovered the root cause of the memory leak is actually from a race condition somewhere up higher in the call stack. With a debugger attached, my interpretation of the normal order of operations when calling Shell.GoToAsync() is Page.SendAppearing(), then Page.OnAppearing(Action), but once every so many iterations the execution order will be reversed, thusly causing the event to be subscribed and never unsubscribed.

I was able to find this out using the unit test I wrote which is not in the PR but in my initial post. You can see here that putting a breakpoint just below ln 623 in Page.cs only breaks for the first time when ctr has a value of 62 in this case:
Screenshot 2024-09-03 at 10 20 45 AM
... up the call stack...
Screenshot 2024-09-03 at 10 21 54 AM

With this in mind, my proposed changes will resolve the memory leak, but not the race condition causing the memory leak in the first place. I will continue to investigate the root cause of the race condition in the mean time.

@mattsetaro
Copy link
Author

@PureWeen @jonathanpeppers @mattleibow Update for you three, sorry I haven't had much time to look at this recently. That said I am confident I was able to figure out the root cause for this issue. At some point the RealParent property on the active Window is getting garbage collected. Here's how I am able to confirm that and you can test for yourself:

Using the following unit test, place a breakpoint on Page.cs line 623 (the end of OnAppearing(Action) where the += is).

[Fact]
public async void MemoryLeak()
{
    Routing.RegisterRoute("page1/page2", typeof(DummyPage));
    Routing.RegisterRoute("page1/page2/page3", typeof(DumberPage));
    
    var shell = new TestShell(
    CreateShellItem(shellSectionRoute: "page1")
    );
    
    int ctr = 0;
    while (true)
    {
        await shell.GoToAsync("page1/page2/page3");
        Assert.True(shell.CurrentPage is DumberPage);
        
        await shell.GoToAsync("..");
        Assert.True(shell.CurrentPage is DummyPage);
        
        if (++ctr % 1000 == 0)
        {
	    _testOutputHelper.WriteLine("Total memory: " + GC.GetTotalMemory(false).ToString("N0"));
        }
    }
}

class DummyPage : Page
{
}

class DumberPage : Page
{
}
Screenshot 2024-09-10 at 10 29 08 AM

Run the unit test, when the breakpoint hits, jump down the call stack to the above unit test and make note of the ctr value, as it should be greater than 0, in my case somewhere between 55-65, meaning we haven't gotten to the event += until 50 iterations of the loop, or about 100 navigations. Hit play to resume the unit test, and jump down the call stack again to note the value of ctr should be += 1. This should be the behavior for every iteration moving forwards.

Now add in another breakpoint on Element.cs line 353 (the else condition of the getter of RealParent) and restart the test. When running the test, note that the breakpoint in Element.cs breaks BEFORE the original breakpoint in Page.cs.
Screenshot 2024-09-10 at 10 30 21 AM

With the breakpoint broken inElement.cs jump down the call stack again to the unit test again and make note of the ctr value. Remove the breakpoint in Element.cs and resume execution. Note the Page.cs breakpoint should break immediately, and if you jump down the call stack and look at the value of ctr it should be the same as when the breakpoint in Element.cs broke. From there on out, the Page.cs breakpoint will break every navigation attempt.

This test confirms that the Window.Parent.RealParent is being garbage collected at some point. Now I am not that familiar with the framework's code as I only just started looking at it with this effort, so I can't help but think I am mocking something incorrectly, or the test/mock Window or Shell code is not set up properly. If one of you are able to confirm or deny that I would appreciate it.

In the mean time I will look into why the RealParent is being garbage collected. Thanks

@mattsetaro
Copy link
Author

@PureWeen @jonathanpeppers @mattleibow I added in some changes that do fix the issue of the RealParent property being garbage collected. That said I have not worked much with WeakReference and would appreciate feedback on the new changes. Thanks

@mattsetaro
Copy link
Author

@PureWeen @jonathanpeppers @mattleibow I understand you three are probably very busy but we are trying to test if these changes will fix a memory leak for a client's project. If you can look at the PR and or assist in creating a pre-release build to test with I would really appreciate it.

@PureWeen
Copy link
Member

/azp run

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@PureWeen
Copy link
Member

/rebase

@@ -280,6 +280,7 @@ internal bool RemoveLogicalChild(Element element, int index)

internal bool Owned { get; set; }

Element _strongParentOverrideReference;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to break other scenarios with memory leaks. The main point of the weakreference on the Parent is to avoid circular references on iOS which cause things to not get collected. So, we'll need to understand a bit more about why the Parent here is getting collected at all.

The parent should ideally be rooted to something so that doesn't happen, but it looks like there's probably a scenario we're not accounting for

@PureWeen
Copy link
Member

@PureWeen @jonathanpeppers @mattleibow Update for you three, sorry I haven't had much time to look at this recently. That said I am confident I was able to figure out the root cause for this issue. At some point the RealParent property on the active Window is getting garbage collected. Here's how I am able to confirm that and you can test for yourself:

Using the following unit test, place a breakpoint on Page.cs line 623 (the end of OnAppearing(Action) where the += is).

[Fact]
public async void MemoryLeak()
{
    Routing.RegisterRoute("page1/page2", typeof(DummyPage));
    Routing.RegisterRoute("page1/page2/page3", typeof(DumberPage));
    
    var shell = new TestShell(
    CreateShellItem(shellSectionRoute: "page1")
    );
    
    int ctr = 0;
    while (true)
    {
        await shell.GoToAsync("page1/page2/page3");
        Assert.True(shell.CurrentPage is DumberPage);
        
        await shell.GoToAsync("..");
        Assert.True(shell.CurrentPage is DummyPage);
        
        if (++ctr % 1000 == 0)
        {
	    _testOutputHelper.WriteLine("Total memory: " + GC.GetTotalMemory(false).ToString("N0"));
        }
    }
}

class DummyPage : Page
{
}

class DumberPage : Page
{
}
Screenshot 2024-09-10 at 10 29 08 AM Run the unit test, when the breakpoint hits, jump down the call stack to the above unit test and make note of the `ctr` value, as it should be greater than 0, in my case somewhere between 55-65, meaning we haven't gotten to the event += until 50 iterations of the loop, or about 100 navigations. Hit play to resume the unit test, and jump down the call stack again to note the value of `ctr` should be += 1. This should be the behavior for every iteration moving forwards.

Now add in another breakpoint on Element.cs line 353 (the else condition of the getter of RealParent) and restart the test. When running the test, note that the breakpoint in Element.cs breaks BEFORE the original breakpoint in Page.cs. Screenshot 2024-09-10 at 10 30 21 AM

With the breakpoint broken inElement.cs jump down the call stack again to the unit test again and make note of the ctr value. Remove the breakpoint in Element.cs and resume execution. Note the Page.cs breakpoint should break immediately, and if you jump down the call stack and look at the value of ctr it should be the same as when the breakpoint in Element.cs broke. From there on out, the Page.cs breakpoint will break every navigation attempt.

This test confirms that the Window.Parent.RealParent is being garbage collected at some point. Now I am not that familiar with the framework's code as I only just started looking at it with this effort, so I can't help but think I am mocking something incorrectly, or the test/mock Window or Shell code is not set up properly. If one of you are able to confirm or deny that I would appreciate it.

In the mean time I will look into why the RealParent is being garbage collected. Thanks

I tested this locally and for me the RealParent that gets collected is the testWindow and testApp because those were going out of scope and then getting collected.

I've pushed up a small change to TestWindow and TestApp and now with the weak references it never takes that path with the realparent

Also, your SendDisappearingUnsubscribesAppearingEvent isn't passing for me based on your latest changes

image

@PureWeen
Copy link
Member

@mattsetaro I'm running your sample against the latest MAUI branch and I'm not seeing it subscribing to a bunch of appearing events anymore

@mattsetaro
Copy link
Author

mattsetaro commented Sep 25, 2024

@PureWeen commit hash 68524c8 is still having leaking Page.Appearing subscriptions on my end:
Screenshot 2024-09-25 at 3 23 22 PM

using:

ShellUriHandlerTests.cs
[Fact]
public async void MemoryLeak()
{
    Routing.RegisterRoute("page1/page2", typeof(DummyPage));
    Routing.RegisterRoute("page1/page2/page3", typeof(DumberPage));
    
    var shell = new TestShell(
    CreateShellItem(shellSectionRoute: "page1")
    );
    
    int ctr = 0;
    while (true)
    {
        await shell.GoToAsync("page1/page2/page3");
        Assert.True(shell.CurrentPage is DumberPage);
        
        await shell.GoToAsync("..");
        Assert.True(shell.CurrentPage is DummyPage);
        
        if (++ctr % 1000 == 0)
        {
	    _testOutputHelper.WriteLine("Total memory: " + GC.GetTotalMemory(false).ToString("N0"));
        }
    }
}

class DummyPage : Page
{
}

class DumberPage : Page
{
}

Can you please verify your versions and how you tested it?

EDIT: If I add in the changes you committed to this PR to latest maui (for the TestShell and TestWindow) I don't see it leaking but does that mean this can never happen in a real-world scenario? I would like your opinion. Trying to figure out if we should move forward with fixing this. Thanks

@sitecompass
Copy link

Hello, just seeing when this one is looking to be resolved. Appreciate all the hard work that has gone into this thus far.

@PureWeen
Copy link
Member

@PureWeen commit hash 68524c8 is still having leaking Page.Appearing subscriptions on my end: Screenshot 2024-09-25 at 3 23 22 PM

using:

ShellUriHandlerTests.cs
[Fact]
public async void MemoryLeak()
{
    Routing.RegisterRoute("page1/page2", typeof(DummyPage));
    Routing.RegisterRoute("page1/page2/page3", typeof(DumberPage));
    
    var shell = new TestShell(
    CreateShellItem(shellSectionRoute: "page1")
    );
    
    int ctr = 0;
    while (true)
    {
        await shell.GoToAsync("page1/page2/page3");
        Assert.True(shell.CurrentPage is DumberPage);
        
        await shell.GoToAsync("..");
        Assert.True(shell.CurrentPage is DummyPage);
        
        if (++ctr % 1000 == 0)
        {
	    _testOutputHelper.WriteLine("Total memory: " + GC.GetTotalMemory(false).ToString("N0"));
        }
    }
}

class DummyPage : Page
{
}

class DumberPage : Page
{
}

Can you please verify your versions and how you tested it?

EDIT: If I add in the changes you committed to this PR to latest maui (for the TestShell and TestWindow) I don't see it leaking but does that mean this can never happen in a real-world scenario? I would like your opinion. Trying to figure out if we should move forward with fixing this. Thanks

The App and window get pretty rooted when running your application.

If the App or Window were to get GC'd out from under us you'd see hard crashes in the app not just memory leaks.

This might happen for embedding scenarios where the developer doesn't store a reference to the app or window. But even then the app and window are usually referenced by handlers and those handlers will be rooted to the visual tree which aren't going to get collected

@mattsetaro mattsetaro closed this Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-navigation NavigationPage community ✨ Community Contribution
Projects
None yet
6 participants