Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mir:wayland] Fix handling of display unplug/replug #3433

Merged
merged 10 commits into from
Jul 2, 2024

Conversation

AlanGriffiths
Copy link
Collaborator

@AlanGriffiths AlanGriffiths commented Jun 20, 2024

Fixes: #3427

This is mostly about managing the lifetime of EGL entities correctly, but cleaned up some threading issues along the way

Copy link

codecov bot commented Jun 20, 2024

Codecov Report

Attention: Patch coverage is 88.13559% with 7 lines in your changes missing coverage. Please review.

Project coverage is 77.48%. Comparing base (31ef078) to head (bb4bca6).
Report is 460 commits behind head on main.

Files with missing lines Patch % Lines
src/platforms/wayland/wl_egl_display_provider.cpp 85.41% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3433      +/-   ##
==========================================
+ Coverage   77.27%   77.48%   +0.20%     
==========================================
  Files        1070     1073       +3     
  Lines       68332    68802     +470     
==========================================
+ Hits        52806    53311     +505     
+ Misses      15526    15491      -35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AlanGriffiths AlanGriffiths force-pushed the MIRENG-569/fix-mir-wayland branch from a9a90d8 to 9de853e Compare June 26, 2024 15:15
@AlanGriffiths AlanGriffiths marked this pull request as ready for review June 26, 2024 15:56
@AlanGriffiths AlanGriffiths requested a review from a team as a code owner June 26, 2024 15:56
@AlanGriffiths
Copy link
Collaborator Author

For anyone wishing to try this out, a simple way to reproduce is to use Miriway as the host system and amend the .display to disable outputs (I've done this over ssh while investigating).

Running any program that is submitting buffers when the unplug occurs creates problems.

So here are the incantations I used (both over ssh, for ease of use):

cmake-build-debug/bin/miral-app -demo-server -terminal glmark2-wayland --wayland-host=wayland-0

And, in a separate shell:

while cp ~/.config/miriway-shell.display{~disabled,}; sleep 2; cp ~/.config/miriway-shell.display{~enabled,}; sleep 5; do :; done

Clearly, the ~{en,dis}abled configs need to exist and match your system

Copy link
Contributor

@RAOF RAOF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, yes.

Apart from the nit with EGL state, yes. This is indeed much more correct.

}

~EGLState()
{
CacheEglState stash;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. If ~EGLState() is called with ctx current, then this will capture that, we'll tear down ctx, and then ~CacheEglState() will try to make ctx current again. That will fail (normally silently because we don't check the return value, but noisily if we use --debug).

Maybe...

Suggested change
CacheEglState stash;
std::unique_ptr<CacheEglState> stash;
if (eglGetCurrentContext() != ctx)
{
stash = std::make_unique<CacheEglState>();
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't realised that could matter. But in testing this suggestion I found that there's still an intermittent failure mode (with or without this):

terminate called after throwing an instance of 'boost::wrapexcept<std::system_error>'
  what():  Failed to create EGL surface: EGL_BAD_DISPLAY (0x3008)

Reverted to "Draft" as I need to track that down.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! This may be worse than I thought...

Fatal glibc error: pthread_mutex_lock.c:94 (___pthread_mutex_lock): assertion failed: mutex->__data.__owner == 0

Copy link
Collaborator Author

@AlanGriffiths AlanGriffiths Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least I now know how to trigger this failure - it's when the host reaches "idle-timeout"

[edit]

No, that conclusion was wrong

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RAOF I have updated ~EGLState() to something I think is simpler and should work. If you could confirm that?

Meantime, there's another failure mode I have still to resolve

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't realised that could matter.

Mostly it doesn't matter - the call will fail, but without any side-effect. Except when someone tries with EGL debug enabled, at which point the failure will be logged. My only concern here was avoiding a potential debugging wild goose chase in the future.

'Twas only a happy accident that this prompted further testing that identified your new problem!

I have updated ~EGLState() to something I think is simpler and should work. If you could confirm that?

Yup! That works nicely.

@AlanGriffiths AlanGriffiths marked this pull request as draft July 1, 2024 08:47
@AlanGriffiths AlanGriffiths marked this pull request as ready for review July 2, 2024 13:09
@AlanGriffiths
Copy link
Collaborator Author

OK, I'm landing this as it is a BIG improvement. There's no need to gate that on tracking down an intermittent failure

@AlanGriffiths AlanGriffiths enabled auto-merge July 2, 2024 13:11
@AlanGriffiths AlanGriffiths added this pull request to the merge queue Jul 2, 2024
Merged via the queue into main with commit 5f8b842 Jul 2, 2024
63 of 71 checks passed
@AlanGriffiths AlanGriffiths deleted the MIRENG-569/fix-mir-wayland branch July 2, 2024 13:43
Saviq pushed a commit that referenced this pull request Aug 8, 2024
Fixes: #3427

This is mostly about managing the lifetime of EGL entities correctly,
but cleaned up some threading issues along the way
github-merge-queue bot pushed a commit that referenced this pull request Aug 8, 2024
)

> Backport of #3433 to 2.16 for release into core22

Fixes: #3427

This is mostly about managing the lifetime of EGL entities correctly,
but cleaned up some threading issues along the way
Saviq pushed a commit that referenced this pull request Aug 8, 2024
Fixes: #3427

This is mostly about managing the lifetime of EGL entities correctly,
but cleaned up some threading issues along the way
Saviq pushed a commit that referenced this pull request Aug 8, 2024
Fixes: #3427

This is mostly about managing the lifetime of EGL entities correctly,
but cleaned up some threading issues along the way
Saviq pushed a commit that referenced this pull request Aug 8, 2024
Fixes: #3427

This is mostly about managing the lifetime of EGL entities correctly,
but cleaned up some threading issues along the way
github-merge-queue bot pushed a commit that referenced this pull request Aug 9, 2024
)

> Backport of #3433 to 2.17 for release into core22

Fixes: #3427

This is mostly about managing the lifetime of EGL entities correctly,
but cleaned up some threading issues along the way
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mir_kiosk_x11 crashes when the host platform has no outputs
2 participants