-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misc. bug: Inconsistent Vulkan segfault #10528
Comments
This might be a driver bug. Can you try the latest drivers? I think there's also a chance this could be caused by the ggml-vulkan backend not destroying the VkDevice/VkInstance before the process is terminated. That's something we should look into fixing. |
We may need to add a function to destroy a backend and release all the resources, otherwise calling |
IMO issue #10420 is also a question about the object model for ggml backends, i.e. should it be possible for each thread to have its own VkInstance/VkDevice and what ggml/llama object should their lifetime be tied to. |
I think this is a bug I often observe on Linux, but only on Nvidia. It happens when exiting the application, so some issue with clean-up. I haven't looked into it yet.
I've tried building the global stuff in the way I think CUDA is handling it in the background, but it's not done yet. It should be possible to keep the device and instance global if all temporary variables stay attached to the backend instance. Command buffers are probably not the only thing where that hasn't been implemented yet. |
This is how the crash looks for me. It happens in some Nvidia driver thread after all the ggml code has already exited:
This is the thread:
This is the stack trace:
@jeffbolznv Do you happen to know what the Interestingly @RobbyCBennett saw it in one of the other Nvidia threads ( Resolving this has gotten a little more important since @LostRuins reported that a crash on exit on Windows in certain cases causes a system crash (BSOD). Might be the same cause. |
I hacked together a commit (335f48a) where the devices and Vulkan instance get cleaned up properly (at least I think so, validation layers didn't print anything), but the Nvidia driver still segfaults. |
They're both threads created by the driver. I'm pretty sure they should get shut down when the VkDevice is destroyed.
Is this relying on the static destructor for the vk_instance pointer? I think that may happen too late. Is there a hook where we can destroy the objects before the process is terminated? |
Not at the moment. I may add destructors for the backend_device and backend_reg objects in the future, but these would still rely on a static destructor to be called normally when exiting the application. I understand that static destructors can be risky due to the order of destruction, but I am not sure why that should be a problem for the Vulkan driver. I would very much prefer to avoid adding a function to shutdown ggml unless it is absolutely necessary. |
I had some system problems when updating the driver, but I finally got some results. I still see segmentation faults with the new driver. I haven't tried the commit 335f48a yet. The different types of seg faults I got:
Here's some more system information if that helps at all:
|
It's probably not absolutely necessary, since this issue only appears on Nvidia. Their driver should handle this gracefully. |
@0cc4m or @RobbyCBennett, can you try adding a call into ggml-vulkan to destroy the VkDevice and VkInstance right before dlclose is called in unload_backend?
I suspect (not sure) it's OK to invoke the cleanup from a static destructor in the main executable (or ggml.so?), as long as it's before ggml-vulkan or the vulkan driver libraries have been unloaded. Linux doesn't give a good way to have this entirely self-contained in the vulkan driver or in ggml-vulkan. I think some kind of call from ggml is needed. |
I don't know this code and I haven't made any commits to this project. I don't see any VkDevice or VkInstance types. Maybe @0cc4m can make this change. |
Here's an idea for a potential workaround: provide a way to have a preferred backend. For my example, if CUDA is available, then use CUDA otherwise use Vulkan. I don't currently see a way to specify the preferred backend. Maybe it could look like the following. enum llama_specific_backend_type {
LLAMA_SPECIFIC_BACKEND_TYPE_CUDA,
LLAMA_SPECIFIC_BACKEND_TYPE_VULKAN,
// others...
};
const llama_specific_backend_type PREFERRED_BACKENDS[] = {
LLAMA_SPECIFIC_BACKEND_TYPE_CUDA,
LLAMA_SPECIFIC_BACKEND_TYPE_VULKAN,
};
int main()
{
llama_set_backend(PREFERRED_BACKENDS, sizeof(PREFERRED_BACKENDS) / sizeof(PREFERRED_BACKENDS[0]));
} |
You can set the devices that you want to use in |
I don't have any crashes on CUDA, so selecting CUDA instead of Vulkan at runtime would prevent crashing in Vulkan with Nvidia. It wouldn't actually fix Vulkan crashing. It would just be a workaround. |
If you build with |
I looked into both options and Here's a snippet of my workaround: // ... create the params
#ifdef __linux__
static ggml_backend_device *const sDevice = ggml_backend_dev_by_name("CUDA0");
if (sDevice != nullptr) {
static ggml_backend_dev_t sDevices[] = {sDevice, nullptr};
params.devices = sDevices;
}
#endif
// ... use the params |
That should work, but if you don't intend to use the Vulkan backend at all, you can avoid loading it entirely by using Eventually this will become the standard in all the llama.cpp binary distributions. |
I still intend to use the Vulkan backend to support non-CUDA hardware like AMD. I'll keep that in mind. Thank you. |
I'll look into it soon, I've been busy with #10597 |
I've borrowed a linux system and have reproduced this locally, I'll try to put together a fix. |
Unfortunately, I've been unable to reproduce this again, running for the rest of the day. Only ever saw it the one time. So I'm not sure this system will be very helpful for testing. In the meantime, I looked at the destruction order on Windows. Looks like the Vulkan driver gets unloaded before any static destructors run in ggml, so by then it's too late to do any cleanup. So I don't think we can handle this automatically from, say, ~ggml_backend_registry. |
@RobbyCBennett Can you try #10989? For me that fixed the segfault. |
With aa014d7 I have a consistent crash if the Vulkan backend is available in the test program on that same Linux system. This even happens if I only use the CUDA device. Stack trace with Vulkan (caused by the destructor
|
That's concerning. Do you have example code that triggers this crash? |
Yes. Here's my original example with the addition of changing #include <stdio.h>
#include "llama.h"
static void handleLog(enum ggml_log_level level, const char *text, void *user_data) {}
int main(int argc, char **argv)
{
llama_log_set(handleLog, 0);
struct llama_model_params params = llama_model_default_params();
// Only use CUDA if it's available
static ggml_backend_device *const sDevice = ggml_backend_dev_by_name("CUDA0");
if (sDevice != nullptr) {
puts("Switching to CUDA");
static ggml_backend_dev_t sDevices[] = {sDevice, nullptr};
params.devices = sDevices;
}
else {
puts("Not using CUDA");
}
char path[] = "/your-path-to/llama.cpp/models/ggml-vocab-llama-bpe.gguf";
struct llama_model *model = llama_load_model_from_file(path, params);
llama_free_model(model);
return 0;
} |
I can reproduce a crash with the example code, even on non-Nvidia GPUs. But I don't understand what causes it. It segfaults on the instance destruction, but the validation layers don't call out any resources left, so it should be okay to destroy. With Nvidia it crashes on destroying the device fence, which also doesn't make much sense, as that should always be okay to destroy, as long as it was created previously. Any ideas what I could try/look into, @jeffbolznv ? |
Has the driver and/or vulkan loader already been closed? |
Is there an easy way to check that? I guess they probably have been, but I don't know why. |
Maybe using LD_DEBUG? I haven't done that in a while. |
I didn't see anything suspicious in the LD_DEBUG output (but I also don't have any experience reading it), and I didn't have any luck with a debugger. In mesa it seems to crash in the instance destroy function, but behind a function pointer. When I build mesa manually, it doesn't crash. On Intel GPU with mesa it triggers a pipeline cache assertion. I'm not sure how to continue. |
I reproduced a crash on Windows. I don't know why I didn't see this when I previously tested #10989, maybe I was running some test that explicitly unloads the backend. But when I run this minimal example, nothing calls into ggml-vulkan to free the backend, so it gets freed when the static destructor for the smart pointer is destroyed. And this is too late, the driver has already been unloaded and calling into the driver crashes. |
Apparently we make it worse by trying to destroy the instance, but is there a way to recognize this case? Or to make sure the instance gets destroyed before the unload happens? |
I don't think there's a way to get the static destructors and library unloads to be called in a particular order. It needs to be invoked by the code somehow. Some code like test-backend-ops calls ggml_backend_free explicitly. Looks like llama.cpp expects it to get called via the llama_context destructor, but in this test there's no llama_context. |
What is triggering the vulkan unload? I would expect libraries to be unloaded by the OS during the process destruction, which should happen after the static destructors. |
If the crash happens consistently in the |
To summarize, we started out with an inconsistent segfault in an Nvidia-specific driver thread. We guessed that this happened because we did not free all Vulkan resources properly, so I added a way to check whether ggml still held Vulkan resources (devices or buffers) and cleaned up the device and Vulkan instance otherwise. But this now leads to a reliable segfault in cases where the backend is not freed explicitely, since it tries to destroy the instance during static destructor calls. Apparently this is after the Vulkan driver is no longer loaded, so it segfaults. I guess this might be cause the Vulkan library itself is just a layer between applications and drivers, so it unloads the driver connection in an internal static destructor, that might be before or after the application's destructors? |
I do not see how the vulkan library could be unloaded before the static destructors. Static destructors are usually called using the same mechanism as |
These are the call stacks I see on Windows, in this order (note that this is with the example code from the OP pasted into llama-embedding): Driver being unloaded:
static destructor being called:
I don't think there are any guarantees about order of static destructor vs dll unload. |
I think what's happening there is that since the backend is a dynamic library, its static destructors are called when the dynamic libraries are being unloaded. And for some reason, the NVIDIA driver gets unloaded before |
I was mistaken, this is still an Nvidia-only problem. But it also occurs if no Nvidia GPU is used, it's enough if the driver is active. If I force the Vulkan ICD to only load the AMD or Intel driver, it runs through just fine:
If I use the Nvidia driver:
I'm not 100% sure on how to read this, but to me it looks like it unloads a bunch of Nvidia libraries before unloading |
Yes, it still happens with |
It would be possible to control when the vulkan backend is unloaded by building with #include <stdio.h>
#include "llama.h"
static void handleLog(enum ggml_log_level level, const char *text, void *user_data) {}
int main(int argc, char **argv)
{
llama_log_set(handleLog, 0);
auto vk_reg = ggml_backend_load("ggml-vulkan.dll");
struct llama_model_params params = llama_model_default_params();
// Only use CUDA if it's available
static ggml_backend_device *const sDevice = ggml_backend_dev_by_name("CUDA0");
if (sDevice != nullptr) {
puts("Switching to CUDA");
static ggml_backend_dev_t sDevices[] = {sDevice, nullptr};
params.devices = sDevices;
}
else {
puts("Not using CUDA");
}
char path[] = "/your-path-to/llama.cpp/models/ggml-vocab-llama-bpe.gguf";
struct llama_model *model = llama_load_model_from_file(path, params);
llama_free_model(model);
ggml_backend_unload(vk_reg);
return 0;
} |
That crashes in the same way. I'm not sure what else to try, maybe this can only be fixed from Nvidia's side? The only way I can get it to work is if I use #include <stdio.h>
#include "llama.h"
static void handleLog(enum ggml_log_level level, const char *text, void *user_data) {}
int main(int argc, char **argv) {
llama_log_set(handleLog, 0);
ggml_backend_dev_t dev = ggml_backend_dev_get(0);
ggml_backend_t backend = ggml_backend_dev_init(dev, NULL);
struct llama_model_params params = llama_model_default_params();
char path[] = "/path/to/llama.cpp/models/ggml-vocab-llama-bpe.gguf";
struct llama_model *model = llama_load_model_from_file(path, params);
llama_free_model(model);
ggml_backend_free(backend);
return 0;
} |
and is called before the driver is unloaded (same stack as last time). I don't know if this is just luck or is guaranteed.
This is complicated by the fact that the Vulkan Loader sits in between. IIRC there are some settings that may affect this, I'll look into it. |
I was thinking of VK_LOADER_DISABLE_DYNAMIC_LIBRARY_UNLOADING, but it didn't help. Even explicitly loading the driver DLL from ggml-vulkan doesn't prevent it from being detached first during exit. |
I'm not sure the latter is true. I tried to run some Vulkan benchmarks on my Nvidia machine, and I also tried other values:
|
Name and Version
library 531cb1c (gguf-v0.4.0-2819-g531cb1c2)
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
No response
Problem description & steps to reproduce
gdb
debugger.Simple program:
Shell script to run the program several times:
First Bad Commit
No response
Relevant log output
GDB output from crash caused by /lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01
GDB output from crash with unknown cause
The text was updated successfully, but these errors were encountered: