Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makes SDL_perl regress when it (incorrectly) frees a former video surface #305

Closed
smcv opened this issue Jul 18, 2023 · 19 comments · Fixed by #306
Closed

makes SDL_perl regress when it (incorrectly) frees a former video surface #305

smcv opened this issue Jul 18, 2023 · 19 comments · Fixed by #306
Assignees

Comments

@smcv
Copy link
Contributor

smcv commented Jul 18, 2023

Since trying to switch from classic SDL 1.2 to sdl12-compat in Debian, we've been seeing crashes in the test perl t/core_video.t in the test suite of the SDL Perl bindings (packaged as libsdl-perl). On 32-bit platforms it seems to segfault repeatably. On 64-bit platforms it usually succeeds, but using valgrind or AddressSanitizer reveals that there is still a use-after-free, it's just not fatal for whatever reason.

I was able to cut down the failing test to this C code, which I think is equivalent to what the relevant part of t/core_video.t is doing:

/* gcc -ot t.c `pkg-config --cflags --libs sdl`
 * SDL_VIDEODRIVER=dummy valgrind --leak-check=no ./t
 */
#include <SDL/SDL.h>

int main(void)
{
    SDL_Surface *one;
    SDL_Surface *two;

    one = SDL_SetVideoMode(640, 480, 32, SDL_SWSURFACE);
    two = SDL_SetVideoMode(100, 100, 16, SDL_SWSURFACE);

    /* I'm not sure what order the Perl code is really freeing them in, it's using automatic memory management */
    SDL_FreeSurface(two);
    SDL_FreeSurface(one);
    return 0;
}

With classic SDL 1.2 (1.2.15, from Debian 12) this succeeds, and valgrind --leak-check=no reports no memory errors.

With sdl12-compat (1.2.64 plus the fixes I contributed since then, from Debian unstable) this fails:

==17851== Invalid read of size 4
==17851==    at 0x4865EEC: SDL_FreeSurface (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.1.2.64)
==17851==    by 0x1091FA: main (in /home/user/t)
==17851==  Address 0x4c9b680 is 56 bytes inside a block of size 60 free'd
==17851==    at 0x4841EA7: free (vg_replace_malloc.c:872)
==17851==    by 0x4F4BE17: ??? (in /usr/lib/i386-linux-gnu/libSDL2-2.0.so.0.2800.1)
==17851==    by 0x4F4C170: ??? (in /usr/lib/i386-linux-gnu/libSDL2-2.0.so.0.2800.1)
==17851==    by 0x4865F29: SDL_FreeSurface (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.1.2.64)
==17851==    by 0x4866083: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.1.2.64)
==17851==    by 0x486B554: SDL_SetVideoMode (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.1.2.64)
==17851==    by 0x1091DB: main (in /home/user/t)
==17851==  Block was alloc'd at
==17851==    at 0x483F660: malloc (vg_replace_malloc.c:381)
==17851==    by 0x4F4BE97: ??? (in /usr/lib/i386-linux-gnu/libSDL2-2.0.so.0.2800.1)
==17851==    by 0x4F4C054: ??? (in /usr/lib/i386-linux-gnu/libSDL2-2.0.so.0.2800.1)
==17851==    by 0x485E177: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.1.2.64)
==17851==    by 0x4865C3F: SDL_CreateRGBSurface (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.1.2.64)
==17851==    by 0x486BD88: SDL_SetVideoMode (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.1.2.64)
==17851==    by 0x1091C8: main (in /home/user/t)

I tried to understand what the ownership model is for these surfaces, but found it quite confusing.

@smcv
Copy link
Contributor Author

smcv commented Jul 18, 2023

@sezero
Copy link
Contributor

sezero commented Jul 18, 2023

valgrind report on i686-linux:

==9153== Invalid read of size 4
==9153==    at 0x401FD79: SDL_FreeSurface (SDL12_compat.c:5182)
==9153==    by 0x8048554: main (1.c:16)
==9153==  Address 0x406ad58 is 56 bytes inside a block of size 60 free'd
==9153==    at 0x4006CAF: free (vg_replace_malloc.c:446)
==9153==    by 0x44FBEBC: real_free (SDL_malloc.c:5199)
==9153==    by 0x44FBD21: SDL_free_REAL (SDL_malloc.c:5320)
==9153==    by 0x448453D: SDL_free (SDL_dynapi_procs.h:408)
==9153==    by 0x401FDDB: SDL_FreeSurface (SDL12_compat.c:5190)
==9153==    by 0x4020A2E: EndVidModeCreate (SDL12_compat.c:5508)
==9153==    by 0x4022DE8: SetVideoModeImpl (SDL12_compat.c:5982)
==9153==    by 0x4023DC8: SDL_SetVideoMode (SDL12_compat.c:6329)
==9153==    by 0x8048538: main (1.c:12)
==9153== 
==9153== Invalid write of size 4
==9153==    at 0x401FD82: SDL_FreeSurface (SDL12_compat.c:5182)
==9153==    by 0x8048554: main (1.c:16)
==9153==  Address 0x406ad58 is 56 bytes inside a block of size 60 free'd
==9153==    at 0x4006CAF: free (vg_replace_malloc.c:446)
==9153==    by 0x44FBEBC: real_free (SDL_malloc.c:5199)
==9153==    by 0x44FBD21: SDL_free_REAL (SDL_malloc.c:5320)
==9153==    by 0x448453D: SDL_free (SDL_dynapi_procs.h:408)
==9153==    by 0x401FDDB: SDL_FreeSurface (SDL12_compat.c:5190)
==9153==    by 0x4020A2E: EndVidModeCreate (SDL12_compat.c:5508)
==9153==    by 0x4022DE8: SetVideoModeImpl (SDL12_compat.c:5982)
==9153==    by 0x4023DC8: SDL_SetVideoMode (SDL12_compat.c:6329)
==9153==    by 0x8048538: main (1.c:12)
==9153== 
==9153== Invalid read of size 4
==9153==    at 0x401FD88: SDL_FreeSurface (SDL12_compat.c:5183)
==9153==    by 0x8048554: main (1.c:16)
==9153==  Address 0x406ad58 is 56 bytes inside a block of size 60 free'd
==9153==    at 0x4006CAF: free (vg_replace_malloc.c:446)
==9153==    by 0x44FBEBC: real_free (SDL_malloc.c:5199)
==9153==    by 0x44FBD21: SDL_free_REAL (SDL_malloc.c:5320)
==9153==    by 0x448453D: SDL_free (SDL_dynapi_procs.h:408)
==9153==    by 0x401FDDB: SDL_FreeSurface (SDL12_compat.c:5190)
==9153==    by 0x4020A2E: EndVidModeCreate (SDL12_compat.c:5508)
==9153==    by 0x4022DE8: SetVideoModeImpl (SDL12_compat.c:5982)
==9153==    by 0x4023DC8: SDL_SetVideoMode (SDL12_compat.c:6329)
==9153==    by 0x8048538: main (1.c:12)
==9153== 
==9153== 
==9153== HEAP SUMMARY:
==9153==     in use at exit: 125,634 bytes in 236 blocks
==9153==   total heap usage: 384 allocs, 148 frees, 3,816,824 bytes allocated
==9153== 
==9153== LEAK SUMMARY:
==9153==    definitely lost: 0 bytes in 0 blocks
==9153==    indirectly lost: 0 bytes in 0 blocks
==9153==      possibly lost: 80,040 bytes in 2 blocks
==9153==    still reachable: 45,594 bytes in 234 blocks
==9153==         suppressed: 0 bytes in 0 blocks
==9153== Rerun with --leak-check=full to see details of leaked memory
==9153== 
==9153== For counts of detected and suppressed errors, rerun with: -v
==9153== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 30 from 13)

@smcv
Copy link
Contributor Author

smcv commented Jul 18, 2023

The larger test-case that I condensed to get this should work on any recent-ish Debian/Ubuntu system, if you happen to have one available:

apt install libsdl-perl
apt source libsdl-perl
cd libsdl-perl-*
perl ./t/core_video.t

plus an appropriate LD_LIBRARY_PATH to get sdl12-compat loaded, if your distro is old enough to default to classic SDL 1.2, which at the time of writing applies to everything except Debian unstable.

I believe fixing this libsdl-perl test suite regression is the last blocker for getting sdl12-compat used as the replacement for classic SDL 1.2 in Debian 13, and hopefully Ubuntu 23.10 as well.

If my reproducer (and therefore libsdl-perl) is doing something that was never meant to work, then fixing it from the API-user side is likely also an option.

@icculus
Copy link
Collaborator

icculus commented Jul 18, 2023

So sdl12-compat generally tries to be bug-for-bug compatible with classic 1.2, but to be clear: calling SDL_FreeSurface on the return value of SDL_SetVideoMode is illegal.

Doubly-so for freeing a surface that was possibly already freed by a second call to SDL_SetVideoMode. That this test works in classic 1.2 is pure luck.

@icculus
Copy link
Collaborator

icculus commented Jul 18, 2023

Yeah, it's the second FreeSurface that's upset; both SDL 1.2 and sdl12-compat see the first one is the current screen surface and treat this as a no-op. The second call isn't a valid pointer anymore, hence the problem.

This works in SDL 1.2 because SDL_SetVideoMode happens to return the same surface pointer with the dummy video backend.

But to be fair, the X11 backend preserves the pointer, too, in this case. Ugh. Okay, let me think on this.

@smcv
Copy link
Contributor Author

smcv commented Jul 18, 2023

Having it be invalid to free the result of SDL_SetVideoMode fits my limited understanding of the memory model in use here. In the terms I'm used to using in the GLib world, I think SDL_SetVideoMode returns a pointer "borrowed" from SDL-internal infrastructure (the caller must not free it, and it may become invalid at a subsequent video mode change), while things like SDL_CreateRGBSurface return a pointer that becomes "owned" by the caller (which may free it whenever convenient). Is that right?

SDL_FreeSurface() special-cases the current video surface to be ignored (you can try as hard as you like to free it in client code, but it will not happen), but doesn't have that special case for a former video surface. Historically, classic SDL 1.2 had a similar setup but with a couple of different variations of what "the current video surface" meant.

In the Perl bindings, the SDL_FreeSurface call is not explicit in the Perl source code: the binding registers a destructor function surface_DESTROY for the SDL::Surface Perl wrappers around C SDL_Surface objects, which calls SDL_FreeSurface internally.

If calling SDL_FreeSurface on the return value of SDL_SetVideoMode is considered to be a programming error (invalid/illegal/undefined behaviour), then I think the Perl binding would have to distinguish between functions where the caller owns the result, like SDL_CreateRGBSurface (which would return a Perl SDL::Surface flagged as "OK to free the surface when the wrapper is destroyed"), and functions where the caller "borrows" the result from SDL, like SDL_SetVideoMode (which would return a Perl SDL::Surface flagged as "don't free the surface when the wrapper is destroyed"). Does that sound about right?

If sdl12-compat aims to be bug-for-bug compatible with classic SDL 1.2 but libsdl-perl is doing something wrong, then it's probably best to fix both sdl12-compat and libsdl-perl.

@icculus
Copy link
Collaborator

icculus commented Jul 18, 2023

So here's why this is happening in sdl12-compat and not SDL-1.2:

5981	    } else if (VideoSurface12 && (VideoSurface12->surface20->format->format != appfmt)) {
5982	        EndVidModeCreate();  /* rebuild the window if changing pixel format */

The internal 2.0 surface format is different than what the app is requesting, so we nuke the surface and start from scratch, even in the dummy driver. appfmt=0x15151002 and VideoSurface12->surface20->format->format=0x16161804. I need to dig further still, but if those match when we hit this point (or can fix the conditional), this test will work.

@smcv
Copy link
Contributor Author

smcv commented Jul 18, 2023

Is it also considered to be undefined behaviour (invalid, illegal, programming error, whatever) to free the result of SDL_GetVideoSurface()? I think probably yes, for the same reasons?

@slouken
Copy link
Collaborator

slouken commented Jul 18, 2023

So here's why this is happening in sdl12-compat and not SDL-1.2:

5981	    } else if (VideoSurface12 && (VideoSurface12->surface20->format->format != appfmt)) {
5982	        EndVidModeCreate();  /* rebuild the window if changing pixel format */

The internal 2.0 surface format is different than what the app is requesting, so we nuke the surface and start from scratch, even in the dummy driver. appfmt=0x15151002 and VideoSurface12->surface20->format->format=0x16161804. I need to dig further still, but if those match when we hit this point (or can fix the conditional), this test will work.

Or we can just hang onto all the old video surfaces (maybe just free the pixels) until SDL_Quit(). It would take a small amount of memory more, but it wouldn't crash. :)

@smcv
Copy link
Contributor Author

smcv commented Jul 18, 2023

Or we can just hang onto all the old video surfaces (maybe just free the pixels) until SDL_Quit().

If the Perl bindings are using automatic memory management, but have an explicit SDL_Quit() call, might they end up trying to free a video surface after SDL_Quit()?

(Assuming that real-world programs only call SDL_Quit() during exit, there isn't really much difference between holding onto a resource until SDL_Quit() and intentionally leaking it, so perhaps intentionally leaking it would be pragmatic?)

@slouken
Copy link
Collaborator

slouken commented Jul 18, 2023

Hmm, true...

@smcv
Copy link
Contributor Author

smcv commented Jul 18, 2023

calling SDL_FreeSurface on the return value of SDL_SetVideoMode is illegal

I reported PerlGameDev/SDL#305.

Is it also considered to be undefined behaviour (invalid, illegal, programming error, whatever) to free the result of SDL_GetVideoSurface()? I think probably yes, for the same reasons?

Was that guess correct?

smcv added a commit to smcv/SDL1_perl that referenced this issue Jul 18, 2023
In many SDL APIs that return a SDL_Surface *, the surface is considered
to be owned by the caller, and must be freed by the caller.

However, SDL_SetVideoMode and presumably SDL_GetVideoSurface return
a pointer to SDL's internal video surface, which will be freed by SDL
if necessary, and must not be freed by library users.
Incorrectly freeing this surface can lead to a use-after-free crash,
manifesting as a test failure in t/core_video.t.

Resolves: libsdl-org/sdl12-compat#305
Signed-off-by: Simon McVittie <[email protected]>
smcv added a commit to smcv/SDL1_perl that referenced this issue Jul 18, 2023
In many SDL APIs that return a SDL_Surface *, the surface is considered
to be owned by the caller, and must be freed by the caller.

However, SDL_SetVideoMode and presumably SDL_GetVideoSurface return
a pointer to SDL's internal video surface, which will be freed by SDL
if necessary, and must not be freed by library users.
Incorrectly freeing this surface can lead to a use-after-free crash,
manifesting as a test failure in t/core_video.t.

See also libsdl-org/sdl12-compat#305

Resolves: PerlGameDev#305
Signed-off-by: Simon McVittie <[email protected]>
@smcv smcv changed the title use-after-free when a former video surface is freed makes SDL_perl regress when it (incorrectly) frees a former video surface Jul 18, 2023
@slouken
Copy link
Collaborator

slouken commented Jul 18, 2023

Is it also considered to be undefined behaviour (invalid, illegal, programming error, whatever) to free the result of SDL_GetVideoSurface()? I think probably yes, for the same reasons?

Was that guess correct?

Yep!

@icculus
Copy link
Collaborator

icculus commented Jul 18, 2023

Was that guess correct?

Yep, it's the same pointer.

@icculus
Copy link
Collaborator

icculus commented Jul 18, 2023

Or we can just hang onto all the old video surfaces (maybe just free the pixels) until SDL_Quit(). It would take a small amount of memory more, but it wouldn't crash. :)

The codepath is set up for this, there's just a block of conditions where we flush everything and start over (going from a software video mode to SDL_OPENGL, etc)...this is probably just overaggressive in this case.

@icculus
Copy link
Collaborator

icculus commented Jul 18, 2023

I need to dig further still, but if those match when we hit this point (or can fix the conditional), this test will work.

Also, I'm not a careful reader: obviously this is going from 32 bit to 16 bit between calls. :)

@smcv
Copy link
Contributor Author

smcv commented Jul 18, 2023

Since libsdl-perl is doing something undefined, I'm now aiming to get this fixed in Debian via libsdl-perl as a higher priority than in sdl12-compat, with sdl12-compat bug-for-bug compatibility as more of a nice-to-have (and if that means we temporarily lose libsdl-perl from Debian, I'll be sad to lose frozen-bubble but it's still a net benefit).

smcv added a commit to smcv/sdl12-compat that referenced this issue Jul 20, 2023
The SDL Perl bindings incorrectly call SDL_FreeSurface() on the result
of functions that return a "borrowed" pointer to the video surface,
namely SDL_SetVideoMode() and SDL_GetVideoSurface().
(See PerlGameDev/SDL#305)

When we would previously have allocated or freed the video surface
wrapper object, instead allocate or free its contents in-place.

When checking whether the video surface exists, because we never destroy
it, we must now also check whether its underlying SDL2 video surface
exists.

Resolves: libsdl-org#305
Signed-off-by: Simon McVittie <[email protected]>
@smcv
Copy link
Contributor Author

smcv commented Jul 20, 2023

So here's a stupid idea: what if we keep the same global struct representing the SDL 1.2 video surface at all times, and just reallocate the SDL 2 object and pixel buffer that it points to? (see #306)

We can even avoid it being a one-per-process leak (harmless, but an annoying false-positive for valgrind/asan) by having it never be malloc'd in the first place: it can just be global data.

@slouken
Copy link
Collaborator

slouken commented Jul 20, 2023

That is a great idea! 😊

smcv added a commit to smcv/sdl12-compat that referenced this issue Aug 26, 2023
The SDL Perl bindings incorrectly call SDL_FreeSurface() on the result
of functions that return a "borrowed" pointer to the video surface,
namely SDL_SetVideoMode() and SDL_GetVideoSurface().
(See PerlGameDev/SDL#305)

When we would previously have allocated or freed the video surface
wrapper object, instead allocate or free its contents in-place.

When checking whether the video surface exists, because we never destroy
it, we must now also check whether its underlying SDL2 video surface
exists.

Resolves: libsdl-org#305
Signed-off-by: Simon McVittie <[email protected]>
slouken pushed a commit that referenced this issue Aug 26, 2023
The SDL Perl bindings incorrectly call SDL_FreeSurface() on the result
of functions that return a "borrowed" pointer to the video surface,
namely SDL_SetVideoMode() and SDL_GetVideoSurface().
(See PerlGameDev/SDL#305)

When we would previously have allocated or freed the video surface
wrapper object, instead allocate or free its contents in-place.

When checking whether the video surface exists, because we never destroy
it, we must now also check whether its underlying SDL2 video surface
exists.

Resolves: #305
Signed-off-by: Simon McVittie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants