Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc. bug: Inconsistent Vulkan segfault #10528

Open
RobbyCBennett opened this issue Nov 26, 2024 · 47 comments
Open

Misc. bug: Inconsistent Vulkan segfault #10528

RobbyCBennett opened this issue Nov 26, 2024 · 47 comments

Comments

@RobbyCBennett
Copy link

RobbyCBennett commented Nov 26, 2024

Name and Version

library 531cb1c (gguf-v0.4.0-2819-g531cb1c2)

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

No response

Problem description & steps to reproduce

  1. Compile the program below
  2. Run it a thousand times and it will probably have a segmentation fault at least once. I used the gdb debugger.

Simple program:

#include "llama.h"

static void handleLog(enum ggml_log_level level, const char *text, void *user_data) {}

int main(int argc, char **argv)
{
  llama_log_set(handleLog, 0);

  char path[] = "/your-path-to/llama.cpp/models/ggml-vocab-llama-bpe.gguf";
  struct llama_model_params params = llama_model_default_params();
  struct llama_model *model = llama_load_model_from_file(path, params);
  llama_free_model(model);

  return 0;
}

Shell script to run the program several times:

#! /bin/sh

PROGRAM=llama-bug
LOG=debug.log
COUNT=1000

rm -f "$LOG"

for i in `seq 1 $COUNT`; do
	gdb -batch -ex run -ex bt "$PROGRAM" >> "$LOG" 2>> "$LOG"
done

First Bad Commit

No response

Relevant log output

GDB output from crash caused by /lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
ggml_vulkan: Compiling shaders..............................Done!

Thread 3 "[vkrt] Analysis" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe35a8640 (LWP 1789333)]
0x00007fffeff1cb00 in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01
#0  0x00007fffeff1cb00 in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01
#1  0x00007ffff0246f1d in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01
#2  0x00007fffeff1fcfa in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01
#3  0x00007ffff7a1dac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#4  0x00007ffff7aaf850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

GDB output from crash with unknown cause

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
ggml_vulkan: Compiling shaders..............................Done!

Thread 3 "[vkrt] Analysis" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe35a8640 (LWP 1750868)]
0x00007fffeff1cb00 in ?? ()
#0  0x00007fffeff1cb00 in ?? ()
#1  0x000000006746139a in ?? ()
#2  0x0000000002a1b0d8 in ?? ()
#3  0x0000000067461399 in ?? ()
#4  0x00000000000e6817 in ?? ()
#5  0x00005555561076c0 in ?? ()
#6  0x00007fffeff1ef10 in ?? ()
#7  0x0000000000000000 in ?? ()
@jeffbolznv
Copy link
Collaborator

This might be a driver bug. Can you try the latest drivers?

I think there's also a chance this could be caused by the ggml-vulkan backend not destroying the VkDevice/VkInstance before the process is terminated. That's something we should look into fixing.

@slaren
Copy link
Member

slaren commented Nov 27, 2024

I think there's also a chance this could be caused by the ggml-vulkan backend not destroying the VkDevice/VkInstance before the process is terminated.

We may need to add a function to destroy a backend and release all the resources, otherwise calling ggml_backend_unload to unload a dynamically loaded backend may result in a leak.

@jeffbolznv
Copy link
Collaborator

IMO issue #10420 is also a question about the object model for ggml backends, i.e. should it be possible for each thread to have its own VkInstance/VkDevice and what ggml/llama object should their lifetime be tied to.

@0cc4m
Copy link
Collaborator

0cc4m commented Nov 27, 2024

This might be a driver bug. Can you try the latest drivers?

I think there's also a chance this could be caused by the ggml-vulkan backend not destroying the VkDevice/VkInstance before the process is terminated. That's something we should look into fixing.

I think this is a bug I often observe on Linux, but only on Nvidia. It happens when exiting the application, so some issue with clean-up. I haven't looked into it yet.

IMO issue #10420 is also a question about the object model for ggml backends, i.e. should it be possible for each thread to have its own VkInstance/VkDevice and what ggml/llama object should their lifetime be tied to.

I've tried building the global stuff in the way I think CUDA is handling it in the background, but it's not done yet. It should be possible to keep the device and instance global if all temporary variables stay attached to the backend instance. Command buffers are probably not the only thing where that hasn't been implemented yet.

@0cc4m
Copy link
Collaborator

0cc4m commented Nov 29, 2024

This is how the crash looks for me. It happens in some Nvidia driver thread after all the ggml code has already exited:

Thread 7 "[vkps] Update" received signal SIGSEGV, Segmentation fault.

This is the thread:

* 7    Thread 0x7fffd56006c0 (LWP 683442) "[vkps] Update"   0x00007fffe5401960 in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.565.57.01

This is the stack trace:

#0  0x00007fffe5401960 in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.565.57.01
#1  0x00007fffe57392b4 in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.565.57.01
#2  0x00007fffe5404dfa in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.565.57.01
#3  0x00007ffff729ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#4  0x00007ffff7329c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

@jeffbolznv Do you happen to know what the [vkps] Update thread is? I don't know what is/isn't getting cleaned up in a way to cause the Nvidia driver to segfault. No other driver shows this issue.

Interestingly @RobbyCBennett saw it in one of the other Nvidia threads ([vkrt] Analysis).

Resolving this has gotten a little more important since @LostRuins reported that a crash on exit on Windows in certain cases causes a system crash (BSOD). Might be the same cause.

@0cc4m
Copy link
Collaborator

0cc4m commented Nov 29, 2024

I hacked together a commit (335f48a) where the devices and Vulkan instance get cleaned up properly (at least I think so, validation layers didn't print anything), but the Nvidia driver still segfaults.

@jeffbolznv
Copy link
Collaborator

They're both threads created by the driver. I'm pretty sure they should get shut down when the VkDevice is destroyed.

I hacked together a commit (335f48a) where the devices and Vulkan instance get cleaned up properly

Is this relying on the static destructor for the vk_instance pointer? I think that may happen too late. Is there a hook where we can destroy the objects before the process is terminated?

@slaren
Copy link
Member

slaren commented Nov 29, 2024

Is there a hook where we can destroy the objects before the process is terminated?

Not at the moment. I may add destructors for the backend_device and backend_reg objects in the future, but these would still rely on a static destructor to be called normally when exiting the application. I understand that static destructors can be risky due to the order of destruction, but I am not sure why that should be a problem for the Vulkan driver. I would very much prefer to avoid adding a function to shutdown ggml unless it is absolutely necessary.

@RobbyCBennett
Copy link
Author

I had some system problems when updating the driver, but I finally got some results. I still see segmentation faults with the new driver. I haven't tried the commit 335f48a yet.

The different types of seg faults I got:

  • Thread 5 "[vkps] Update" received signal SIGSEGV, Segmentation fault.
  • Thread 3 "[vkrt] Analysis" received signal SIGSEGV, Segmentation fault.
  • Thread 5 received signal SIGSEGV, Segmentation fault.

Here's some more system information if that helps at all:

  • Ubuntu 22 before and Ubuntu 24 now
  • NVIDIA RTX 4090

@0cc4m
Copy link
Collaborator

0cc4m commented Nov 29, 2024

I would very much prefer to avoid adding a function to shutdown ggml unless it is absolutely necessary.

It's probably not absolutely necessary, since this issue only appears on Nvidia. Their driver should handle this gracefully.

@jeffbolznv
Copy link
Collaborator

@0cc4m or @RobbyCBennett, can you try adding a call into ggml-vulkan to destroy the VkDevice and VkInstance right before dlclose is called in unload_backend?

I may add destructors for the backend_device and backend_reg objects in the future, but these would still rely on a static destructor to be called normally when exiting the application.

I suspect (not sure) it's OK to invoke the cleanup from a static destructor in the main executable (or ggml.so?), as long as it's before ggml-vulkan or the vulkan driver libraries have been unloaded.

Linux doesn't give a good way to have this entirely self-contained in the vulkan driver or in ggml-vulkan. I think some kind of call from ggml is needed.

@RobbyCBennett
Copy link
Author

I don't know this code and I haven't made any commits to this project. I don't see any VkDevice or VkInstance types. Maybe @0cc4m can make this change.

@RobbyCBennett
Copy link
Author

Here's an idea for a potential workaround: provide a way to have a preferred backend. For my example, if CUDA is available, then use CUDA otherwise use Vulkan. I don't currently see a way to specify the preferred backend. Maybe it could look like the following.

enum llama_specific_backend_type {
    LLAMA_SPECIFIC_BACKEND_TYPE_CUDA,
    LLAMA_SPECIFIC_BACKEND_TYPE_VULKAN,
    // others...
};

const llama_specific_backend_type PREFERRED_BACKENDS[] = {
    LLAMA_SPECIFIC_BACKEND_TYPE_CUDA,
    LLAMA_SPECIFIC_BACKEND_TYPE_VULKAN,
};

int main()
{
  llama_set_backend(PREFERRED_BACKENDS, sizeof(PREFERRED_BACKENDS) / sizeof(PREFERRED_BACKENDS[0]));
}

@slaren
Copy link
Member

slaren commented Dec 2, 2024

You can set the devices that you want to use in llama_model_params::devices, but I don't see how that's related to Vulkan crashing.

@RobbyCBennett
Copy link
Author

I don't have any crashes on CUDA, so selecting CUDA instead of Vulkan at runtime would prevent crashing in Vulkan with Nvidia. It wouldn't actually fix Vulkan crashing. It would just be a workaround.

@slaren
Copy link
Member

slaren commented Dec 2, 2024

If you build with GGML_BACKEND_DL enabled, then you can also use ggml_backend_load to load only the backend that you want to use.

@RobbyCBennett
Copy link
Author

RobbyCBennett commented Dec 2, 2024

I looked into both options and llama_model_params::devices seems to be a good solution for me. Thanks for the help!

Here's a snippet of my workaround:

  // ... create the params
  #ifdef __linux__
    static ggml_backend_device *const sDevice = ggml_backend_dev_by_name("CUDA0");
    if (sDevice != nullptr) {
      static ggml_backend_dev_t sDevices[] = {sDevice, nullptr};
      params.devices = sDevices;
    }
  #endif
  // ... use the params

@slaren
Copy link
Member

slaren commented Dec 2, 2024

That should work, but if you don't intend to use the Vulkan backend at all, you can avoid loading it entirely by using GGML_BACKEND_DL and loading the backends dynamically. That should give you better compatibility, and use less resources. Keep in mind that without it, the CUDA backend will fail to load if the driver is not installed and stop your application from starting entirely.

Eventually this will become the standard in all the llama.cpp binary distributions.

@RobbyCBennett
Copy link
Author

I still intend to use the Vulkan backend to support non-CUDA hardware like AMD. I'll keep that in mind. Thank you.

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 3, 2024

I'll look into it soon, I've been busy with #10597

@jeffbolznv
Copy link
Collaborator

I've borrowed a linux system and have reproduced this locally, I'll try to put together a fix.

@jeffbolznv
Copy link
Collaborator

Unfortunately, I've been unable to reproduce this again, running for the rest of the day. Only ever saw it the one time. So I'm not sure this system will be very helpful for testing.

In the meantime, I looked at the destruction order on Windows. Looks like the Vulkan driver gets unloaded before any static destructors run in ggml, so by then it's too late to do any cleanup. So I don't think we can handle this automatically from, say, ~ggml_backend_registry.

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 29, 2024

@RobbyCBennett Can you try #10989? For me that fixed the segfault.

@RobbyCBennett
Copy link
Author

With aa014d7 I have a consistent crash if the Vulkan backend is available in the test program on that same Linux system. This even happens if I only use the CUDA device.

Stack trace with Vulkan (caused by the destructor ~vk_instance_t):

Thread 1 "ai_test" received signal SIGSEGV, Segmentation fault.
0x00007fffb6a33de0 in ?? ()
#0  0x00007fffb6a33de0 in ?? ()
#1  0x00007ffff7e0a123 in ?? () from /lib/x86_64-linux-gnu/libvulkan.so.1
#2  0x00007fffee4e95bb in ?? () from /lib/x86_64-linux-gnu/libVkLayer_MESA_device_select.so
#3  0x00007ffff7e1dde5 in vkDestroyInstance () from /lib/x86_64-linux-gnu/libvulkan.so.1
#4  0x000055555594d850 in vk::DispatchLoaderStatic::vkDestroyInstance (pAllocator=0x0, instance=<optimized out>, this=<optimized out>) at /usr/include/vulkan/vulkan.hpp:995
#5  vk::Instance::destroy<vk::DispatchLoaderStatic> (d=..., allocator=..., this=<optimized out>) at /usr/include/vulkan/vulkan_funcs.hpp:94
#6  vk_instance_t::~vk_instance_t (this=<optimized out>, __in_chrg=<optimized out>) at /home/robby/sti/src/lib/llama/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:764
#7  std::default_delete<vk_instance_t>::operator() (this=<optimized out>, __ptr=<optimized out>) at /usr/include/c++/13/bits/unique_ptr.h:99
#8  std::default_delete<vk_instance_t>::operator() (__ptr=<optimized out>, this=<optimized out>) at /usr/include/c++/13/bits/unique_ptr.h:93
#9  std::unique_ptr<vk_instance_t, std::default_delete<vk_instance_t> >::~unique_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/unique_ptr.h:404
#10 0x00007fffee847a66 in __run_exit_handlers (status=0, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:108
#11 0x00007fffee847bae in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:138
#12 0x00007fffee82a1d1 in __libc_start_call_main (main=main@entry=0x5555555f9f20 <main(int, char**)>, argc=argc@entry=1, argv=argv@entry=0x7fffffffe448) at ../sysdeps/nptl/libc_start_call_main.h:74
#13 0x00007fffee82a28b in __libc_start_main_impl (main=0x5555555f9f20 <main(int, char**)>, argc=1, argv=0x7fffffffe448, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe438) at ../csu/libc-start.c:360
#14 0x000055555561fa15 in _start ()

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 30, 2024

With aa014d7 I have a consistent crash if the Vulkan backend is available in the test program on that same Linux system. This even happens if I only use the CUDA device.

That's concerning. Do you have example code that triggers this crash?

@RobbyCBennett
Copy link
Author

Yes. Here's my original example with the addition of changing params.devices to use only CUDA.

#include <stdio.h>

#include "llama.h"

static void handleLog(enum ggml_log_level level, const char *text, void *user_data) {}

int main(int argc, char **argv)
{
  llama_log_set(handleLog, 0);

  struct llama_model_params params = llama_model_default_params();

  // Only use CUDA if it's available
  static ggml_backend_device *const sDevice = ggml_backend_dev_by_name("CUDA0");
  if (sDevice != nullptr) {
    puts("Switching to CUDA");
    static ggml_backend_dev_t sDevices[] = {sDevice, nullptr};
    params.devices = sDevices;
  }
  else {
    puts("Not using CUDA");
  }

  char path[] = "/your-path-to/llama.cpp/models/ggml-vocab-llama-bpe.gguf";
  struct llama_model *model = llama_load_model_from_file(path, params);
  llama_free_model(model);

  return 0;
}

@0cc4m
Copy link
Collaborator

0cc4m commented Jan 2, 2025

I can reproduce a crash with the example code, even on non-Nvidia GPUs. But I don't understand what causes it. It segfaults on the instance destruction, but the validation layers don't call out any resources left, so it should be okay to destroy.

With Nvidia it crashes on destroying the device fence, which also doesn't make much sense, as that should always be okay to destroy, as long as it was created previously.

Any ideas what I could try/look into, @jeffbolznv ?

@jeffbolznv
Copy link
Collaborator

Has the driver and/or vulkan loader already been closed?

@0cc4m
Copy link
Collaborator

0cc4m commented Jan 2, 2025

Has the driver and/or vulkan loader already been closed?

Is there an easy way to check that? I guess they probably have been, but I don't know why.

@jeffbolznv
Copy link
Collaborator

Maybe using LD_DEBUG? I haven't done that in a while.

@0cc4m
Copy link
Collaborator

0cc4m commented Jan 3, 2025

I didn't see anything suspicious in the LD_DEBUG output (but I also don't have any experience reading it), and I didn't have any luck with a debugger. In mesa it seems to crash in the instance destroy function, but behind a function pointer. When I build mesa manually, it doesn't crash. On Intel GPU with mesa it triggers a pipeline cache assertion. I'm not sure how to continue.

@jeffbolznv
Copy link
Collaborator

I reproduced a crash on Windows. I don't know why I didn't see this when I previously tested #10989, maybe I was running some test that explicitly unloads the backend. But when I run this minimal example, nothing calls into ggml-vulkan to free the backend, so it gets freed when the static destructor for the smart pointer is destroyed. And this is too late, the driver has already been unloaded and calling into the driver crashes.

@0cc4m
Copy link
Collaborator

0cc4m commented Jan 4, 2025

Apparently we make it worse by trying to destroy the instance, but is there a way to recognize this case? Or to make sure the instance gets destroyed before the unload happens?

@jeffbolznv
Copy link
Collaborator

I don't think there's a way to get the static destructors and library unloads to be called in a particular order. It needs to be invoked by the code somehow. Some code like test-backend-ops calls ggml_backend_free explicitly. Looks like llama.cpp expects it to get called via the llama_context destructor, but in this test there's no llama_context.

@slaren
Copy link
Member

slaren commented Jan 4, 2025

What is triggering the vulkan unload? I would expect libraries to be unloaded by the OS during the process destruction, which should happen after the static destructors.

@slaren
Copy link
Member

slaren commented Jan 4, 2025

If the crash happens consistently in the vk::Instance destructor, you could avoid destroying the vk::Instance entirely, e.g. make it a (dumb) pointer and never free it. It only happens at the end of the process anyway, and the OS will free the resources regardless.

@0cc4m
Copy link
Collaborator

0cc4m commented Jan 4, 2025

If the crash happens consistently in the vk::Instance destructor, you could avoid destroying the vk::Instance entirely, e.g. make it a (dumb) pointer and never free it. It only happens at the end of the process anyway, and the OS will free the resources regardless.

To summarize, we started out with an inconsistent segfault in an Nvidia-specific driver thread. We guessed that this happened because we did not free all Vulkan resources properly, so I added a way to check whether ggml still held Vulkan resources (devices or buffers) and cleaned up the device and Vulkan instance otherwise.

But this now leads to a reliable segfault in cases where the backend is not freed explicitely, since it tries to destroy the instance during static destructor calls. Apparently this is after the Vulkan driver is no longer loaded, so it segfaults.

I guess this might be cause the Vulkan library itself is just a layer between applications and drivers, so it unloads the driver connection in an internal static destructor, that might be before or after the application's destructors?

@slaren
Copy link
Member

slaren commented Jan 4, 2025

I do not see how the vulkan library could be unloaded before the static destructors. Static destructors are usually called using the same mechanism as atexit, they are called by the stdlib when calling exit or leaving main, before any calls to the OS to destroy the process. It may be worth using ltrace to trace all the calls to the vulkan library and see if there is something wrong going on such as double free, or resources that are not freed before the call to vkDestroyInstance.

@jeffbolznv
Copy link
Collaborator

These are the call stacks I see on Windows, in this order (note that this is with the example code from the OP pasted into llama-embedding):

Driver being unloaded:

 	nvoglv64.dll!dllmain_dispatch(HINSTANCE__ * const instance, const unsigned long reason, void * const reserved) Line 281	C++
 	ntdll.dll!LdrpCallInitRoutine()	Unknown
 	ntdll.dll!LdrShutdownProcess()	Unknown
 	ntdll.dll!RtlExitUserProcess()	Unknown
 	kernel32.dll!ExitProcessImplementation�()	Unknown
 	ucrtbased.dll!exit_or_terminate_process(const unsigned int return_code) Line 144	C++
 	ucrtbased.dll!common_exit(const int return_code, const _crt_exit_cleanup_mode cleanup_mode, const _crt_exit_return_mode return_mode) Line 280	C++
 	ucrtbased.dll!exit(int return_code) Line 294	C++
 	llama-embedding.exe!__scrt_common_main_seh() Line 297	C++
 	llama-embedding.exe!__scrt_common_main() Line 331	C++
 	llama-embedding.exe!mainCRTStartup(void * __formal) Line 17	C++

static destructor being called:

 	ggml-vulkan.dll!`dynamic atexit destructor for 'vk_instance''()	C++
 	ucrtbased.dll!_execute_onexit_table::__l2::<lambda>() Line 206	C++
 	ucrtbased.dll!__crt_seh_guarded_call<int>::operator()<void <lambda>(void),int <lambda>(void) &,void <lambda>(void)>(__acrt_lock_and_call::__l2::void <lambda>(void) && setup, _execute_onexit_table::__l2::int <lambda>(void) & action, __acrt_lock_and_call::__l2::void <lambda>(void) && cleanup) Line 202	C++
 	ucrtbased.dll!__acrt_lock_and_call<int <lambda>(void)>(const __acrt_lock_id lock_id, _execute_onexit_table::__l2::int <lambda>(void) && action) Line 974	C++
 	ucrtbased.dll!_execute_onexit_table(_onexit_table_t * table) Line 231	C++
 	ggml-vulkan.dll!__scrt_dllmain_uninitialize_c() Line 399	C++
 	ggml-vulkan.dll!dllmain_crt_process_detach(const bool is_terminating) Line 182	C++
 	ggml-vulkan.dll!dllmain_crt_dispatch(HINSTANCE__ * const instance, const unsigned long reason, void * const reserved) Line 220	C++
 	ggml-vulkan.dll!dllmain_dispatch(HINSTANCE__ * const instance, const unsigned long reason, void * const reserved) Line 293	C++
 	ggml-vulkan.dll!_DllMainCRTStartup(HINSTANCE__ * const instance, const unsigned long reason, void * const reserved) Line 335	C++
 	ntdll.dll!LdrpCallInitRoutine()	Unknown
 	ntdll.dll!LdrShutdownProcess()	Unknown
 	ntdll.dll!RtlExitUserProcess()	Unknown
 	kernel32.dll!ExitProcessImplementation�()	Unknown
 	ucrtbased.dll!exit_or_terminate_process(const unsigned int return_code) Line 144	C++
 	ucrtbased.dll!common_exit(const int return_code, const _crt_exit_cleanup_mode cleanup_mode, const _crt_exit_return_mode return_mode) Line 280	C++
 	ucrtbased.dll!exit(int return_code) Line 294	C++
 	llama-embedding.exe!__scrt_common_main_seh() Line 297	C++
 	llama-embedding.exe!__scrt_common_main() Line 331	C++
 	llama-embedding.exe!mainCRTStartup(void * __formal) Line 17	C++

I don't think there are any guarantees about order of static destructor vs dll unload.

@slaren
Copy link
Member

slaren commented Jan 4, 2025

I think what's happening there is that since the backend is a dynamic library, its static destructors are called when the dynamic libraries are being unloaded. And for some reason, the NVIDIA driver gets unloaded before ggml-vulkan.dll. I don't think that's expected, the order of destruction of libraries should respect the dependencies between them. Does this happen when linking statically with BUILD_SHARED_LIBS=OFF?

@0cc4m
Copy link
Collaborator

0cc4m commented Jan 4, 2025

I was mistaken, this is still an Nvidia-only problem. But it also occurs if no Nvidia GPU is used, it's enough if the driver is active. If I force the Vulkan ICD to only load the AMD or Intel driver, it runs through just fine:

[...]
llama_free_model(0, 8336, 0, 801)                = 0
nanosleep(0x7ffc0057ab20, 0x7ffc0057ab20, 0x7ffc0057ab20, 801) = 0
     23605:
     23605:     calling fini:  [0]
     23605:
     23605:
     23605:     calling fini: /home/user/llama.cpp/build_vk/src/libllama.so [0]
     23605:
     23605:
     23605:     calling fini: /home/user/llama.cpp/build_vk/ggml/src/libggml.so [0]
     23605:
     23605:
     23605:     calling fini: /home/user/llama.cpp/build_vk/ggml/src/libggml-cpu.so [0]
     23605:
     23605:
     23605:     calling fini: /home/user/llama.cpp/build_vk/ggml/src/ggml-vulkan/libggml-vulkan.so [0]
     23605:
~vk_instance_t()
     23605:
     23605:     calling fini: /home/user/llama.cpp/build_vk/ggml/src/libggml-base.so [0]
     23605:
     23605:
     23605:     calling fini: /lib/x86_64-linux-gnu/libgomp.so.1 [0]
     23605:
     23605:
     23605:     calling fini: /usr/local/lib/libvulkan.so.1 [0]
     23605:
     23605:
     23605:     calling fini: /usr/lib/x86_64-linux-gnu/libvulkan_radeon.so [0]
[...]

If I use the Nvidia driver:

llama_free_model(0, 0x5320, 0, 0x3631)           = 0
nanosleep(0x7ffe37ed36d0, 0x7ffe37ed36d0, 0x7ffe37ed36d0, 0x3631) = 0
     24163:	
     24163:	calling fini: /lib/x86_64-linux-gnu/libnvidia-egl-xlib.so.1 [0]
     24163:	
     24163:	
     24163:	calling fini: /lib/x86_64-linux-gnu/libnvidia-egl-xcb.so.1 [0]
     24163:	
     24163:	
     24163:	calling fini: /lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1 [0]
     24163:	
     24163:	
     24163:	calling fini: /lib/x86_64-linux-gnu/libnvidia-egl-wayland.so.1 [0]
     24163:	
     24163:	
     24163:	calling fini: /lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.565.57.01 [0]
     24163:	
     24163:	
     24163:	calling fini: /lib/x86_64-linux-gnu/libnvidia-allocator.so.1 [0]
     24163:	
     24163:	
     24163:	calling fini: /lib/x86_64-linux-gnu/libnvidia-eglcore.so.565.57.01 [0]
     24163:	
     24163:	
     24163:	calling fini:  [0]
     24163:	
     24163:	
     24163:	calling fini: /home/user/llama.cpp/build_vk/src/libllama.so [0]
     24163:	
     24163:	
     24163:	calling fini: /home/user/llama.cpp/build_vk/ggml/src/libggml.so [0]
     24163:	
     24163:	
     24163:	calling fini: /home/user/llama.cpp/build_vk/ggml/src/libggml-cpu.so [0]
     24163:	
     24163:	
     24163:	calling fini: /home/user/llama.cpp/build_vk/ggml/src/ggml-vulkan/libggml-vulkan.so [0]
     24163:	
~vk_instance_t()
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++

I'm not 100% sure on how to read this, but to me it looks like it unloads a bunch of Nvidia libraries before unloading libllama.so, while for AMD it unloads nothing before the main libraries are done. I don't know which of the Nvidia libraries is responsible.

@0cc4m
Copy link
Collaborator

0cc4m commented Jan 4, 2025

I think what's happening there is that since the backend is a dynamic library, its static destructors are called when the dynamic libraries are being unloaded. And for some reason, the NVIDIA driver gets unloaded before ggml-vulkan.dll. I don't think that's expected, the order of destruction of libraries should respect the dependencies between them. Does this happen when linking statically with BUILD_SHARED_LIBS=OFF?

Yes, it still happens with BUILD_SHARED_LIBS=OFF. The unloading order is identical.

@slaren
Copy link
Member

slaren commented Jan 4, 2025

It would be possible to control when the vulkan backend is unloaded by building with GGML_BACKEND_DL and using ggml_backend_load/ggml_backend_unload. E.g.:

#include <stdio.h>

#include "llama.h"

static void handleLog(enum ggml_log_level level, const char *text, void *user_data) {}

int main(int argc, char **argv)
{
  llama_log_set(handleLog, 0);

  auto vk_reg = ggml_backend_load("ggml-vulkan.dll");

  struct llama_model_params params = llama_model_default_params();

  // Only use CUDA if it's available
  static ggml_backend_device *const sDevice = ggml_backend_dev_by_name("CUDA0");
  if (sDevice != nullptr) {
    puts("Switching to CUDA");
    static ggml_backend_dev_t sDevices[] = {sDevice, nullptr};
    params.devices = sDevices;
  }
  else {
    puts("Not using CUDA");
  }

  char path[] = "/your-path-to/llama.cpp/models/ggml-vocab-llama-bpe.gguf";
  struct llama_model *model = llama_load_model_from_file(path, params);
  llama_free_model(model);

  ggml_backend_unload(vk_reg);

  return 0;
}

@0cc4m
Copy link
Collaborator

0cc4m commented Jan 4, 2025

It would be possible to control when the vulkan backend is unloaded by building with GGML_BACKEND_DL and using ggml_backend_load/ggml_backend_unload.

That crashes in the same way. I'm not sure what else to try, maybe this can only be fixed from Nvidia's side?

The only way I can get it to work is if I use ggml_backend_free similarly to test-backend-ops, because that triggers the instance destruction early (it triggers the code in #10989), before the library unloads:

#include <stdio.h>

#include "llama.h"

static void handleLog(enum ggml_log_level level, const char *text, void *user_data) {}

int main(int argc, char **argv) {
    llama_log_set(handleLog, 0);

    ggml_backend_dev_t dev = ggml_backend_dev_get(0);
    ggml_backend_t backend = ggml_backend_dev_init(dev, NULL);

    struct llama_model_params params = llama_model_default_params();

    char path[] = "/path/to/llama.cpp/models/ggml-vocab-llama-bpe.gguf";
    struct llama_model *model = llama_load_model_from_file(path, params);
    llama_free_model(model);

    ggml_backend_free(backend);

    return 0;
}

@jeffbolznv
Copy link
Collaborator

BUILD_SHARED_LIBS=OFF does fix my repro. The static destructor stack becomes:

 	llama-embedding.exe!std::unique_ptr<vk_instance_t,std::default_delete<vk_instance_t>>::~unique_ptr<vk_instance_t,std::default_delete<vk_instance_t>>() Line 3179	C++
 	llama-embedding.exe!`dynamic atexit destructor for 'vk_instance''()	C++
 	ucrtbased.dll!_execute_onexit_table::__l2::<lambda>() Line 206	C++
 	ucrtbased.dll!__crt_seh_guarded_call<int>::operator()<void <lambda>(void),int <lambda>(void) &,void <lambda>(void)>(__acrt_lock_and_call::__l2::void <lambda>(void) && setup, _execute_onexit_table::__l2::int <lambda>(void) & action, __acrt_lock_and_call::__l2::void <lambda>(void) && cleanup) Line 202	C++
 	ucrtbased.dll!__acrt_lock_and_call<int <lambda>(void)>(const __acrt_lock_id lock_id, _execute_onexit_table::__l2::int <lambda>(void) && action) Line 974	C++
 	ucrtbased.dll!_execute_onexit_table(_onexit_table_t * table) Line 231	C++
 	ucrtbased.dll!common_exit::__l2::<lambda>() Line 227	C++
 	ucrtbased.dll!__crt_seh_guarded_call<void>::operator()<void <lambda>(void),void <lambda>(void) &,void <lambda>(void)>(__acrt_lock_and_call::__l2::void <lambda>(void) && setup, common_exit::__l2::void <lambda>(void) & action, __acrt_lock_and_call::__l2::void <lambda>(void) && cleanup) Line 222	C++
 	ucrtbased.dll!__acrt_lock_and_call<void <lambda>(void)>(const __acrt_lock_id lock_id, common_exit::__l2::void <lambda>(void) && action) Line 974	C++
 	ucrtbased.dll!common_exit(const int return_code, const _crt_exit_cleanup_mode cleanup_mode, const _crt_exit_return_mode return_mode) Line 259	C++
 	ucrtbased.dll!exit(int return_code) Line 294	C++
 	llama-embedding.exe!__scrt_common_main_seh() Line 297	C++
 	llama-embedding.exe!__scrt_common_main() Line 331	C++
 	llama-embedding.exe!mainCRTStartup(void * __formal) Line 17	C++

and is called before the driver is unloaded (same stack as last time). I don't know if this is just luck or is guaranteed.

And for some reason, the NVIDIA driver gets unloaded before ggml-vulkan.dll. I don't think that's expected, the order of destruction of libraries should respect the dependencies between them.

This is complicated by the fact that the Vulkan Loader sits in between. IIRC there are some settings that may affect this, I'll look into it.

@jeffbolznv
Copy link
Collaborator

IIRC there are some settings that may affect this, I'll look into it.

I was thinking of VK_LOADER_DISABLE_DYNAMIC_LIBRARY_UNLOADING, but it didn't help. Even explicitly loading the driver DLL from ggml-vulkan doesn't prevent it from being detached first during exit.

@vkhodygo
Copy link

@0cc4m

this is still an Nvidia-only problem. But it also occurs if no Nvidia GPU is used, it's enough if the driver is active.

I'm not sure the latter is true. I tried to run some Vulkan benchmarks on my Nvidia machine, and llama-bench throws a segfault in the very end i.e., after outputting the build version. Running it with -ngl 0 doesn't do so although I didn't test it for consistency.

I also tried other values:

  • 1, 5, 7, 8, 9, 1 run each- OK
  • 10, 3 runs with one fail
  • 99, 3 runs and no fails

@github-actions github-actions bot added the stale label Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants