Total system freeze after a while #87

obones · 2023-06-12T12:38:35Z

obones
Jun 12, 2023

Hello,

As I'm experimenting with the newly arrived Tesla P4 GPU board, I find that my system is stable as long as I don't use the card with willow.
But if I use run.sh with its default parameters, I can do one or two WebRTC recordings and then either the docker image is not responding to anything, or the entire system freezes without anything in the system log.
Here is what WIS is giving me before the crash/freeze:

[2023-06-12 09:53:48 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2023-06-12 09:53:48 +0000] [1] [INFO] Listening at: https://0.0.0.0:19000 (1)
[2023-06-12 09:53:48 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2023-06-12 09:53:48 +0000] [80] [INFO] Booting worker with pid: 80
[2023-06-12 09:53:48 +0000] [80] [INFO] Willow Inference Server is starting... Please wait.
[2023-06-12 09:53:52 +0000] [80] [INFO] CUDA: Detected 1 device(s)
[2023-06-12 09:53:52 +0000] [80] [INFO] CUDA: Device 0 name: Tesla P4
[2023-06-12 09:53:52 +0000] [80] [INFO] CUDA: Device 0 capability: 61
[2023-06-12 09:53:52 +0000] [80] [INFO] CUDA: Device 0 total memory: 7975862272 bytes
[2023-06-12 09:53:52 +0000] [80] [INFO] CUDA: Device 0 free memory: 7859732480 bytes
[2023-06-12 09:53:52 +0000] [80] [WARNING] CUDA: Device 0 has low memory, disabling chunking support
[2023-06-12 09:53:52 +0000] [80] [WARNING] CUDA: Device 0 is pre-Turing, forcing int8
[2023-06-12 09:53:52 +0000] [80] [INFO] Started server process [80]
[2023-06-12 09:53:52 +0000] [80] [INFO] Waiting for application startup.
[2023-06-12 09:53:52 +0000] [80] [INFO] CTRANSLATE: Supported compute types for device cuda are {'int8', 'float32'} - using configured int8
[2023-06-12 09:53:52 +0000] [80] [INFO] Loading Whisper models...
[2023-06-12 09:54:45 +0000] [80] [INFO] Loading TTS models...
[2023-06-12 09:54:57 +0000] [80] [INFO] Warming models...
[2023-06-12 09:55:10 +0000] [80] [INFO] Willow Inference Server is ready for requests!
[2023-06-12 09:55:10 +0000] [80] [INFO] Application startup complete.

Could it come from the use of "large" model? If yes, would using a custom_settings.py file be good enough to select the model?
Or could it come from a faulty GPU card?

Thanks for any help.

nikito · 2023-06-12T12:48:12Z

nikito
Jun 12, 2023

I'm running large model with beam size of 5 on a GTX1070 and don't get any freezing after numerous requests and TTS generation as well. I think as far as ram/processing the two are similar if I am not mistaken? Have you taken a look at output from nvidia-smi to see if the card seems ok?

9 replies

Japi42 Jun 13, 2023

I have the same crashing behavior with my Tesla P4. It looks like the card is stuck in P0 performance state when WIS is running which is cooking the GPU. I don't have active cooling installed yet. Just leaving WIS idle on the system for 15 minutes results in the GPU constantly consuming 25-30 watts and getting up to 93C. After a couple recordings the GPU will thermal overload and go offline and requires a full reboot. Without WIS the GPU stays in P8 performance state, consumes 7 watts, and the temperature is around 40C. I haven't had a chance to figure out if it is a driver issue, GPU settings, or something else (it's running in a Proxmox VM with the GPU passed through). Active cooling will help, but getting the idle power down is needed too.

obones Jun 13, 2023
Author

Thanks for confirming this, I did not notice it earlier but I do see the same P0 and temperature behavior. Once the GPU is "cooked", nvidia-smi gives me this:

Unable to determine the device handle for GPU0000:01:00.0: Unknown Error

And after that nothing but a complete reboot will help.

nikito Jun 13, 2023

Very odd indeed, I also am running a proxmox VM with GPU passthrough and I think the same driver as you, and I do not see this issue (again, running a GTX1070 not a P4, though not sure if that makes any sort of difference) and my GPU stays between 7-10w at idle at all times, only spiking to maybe 50w at most during processing for a bit before going right back down to 10w. Here's an example of my server, been running for 21 hours (I did some dev and some pulls so I restarted my container yesterday, otherwise it is running 24/7 without stopping), and shows idle at 10w:

As I test I hit it with a couple commands to see how much it spikes, and it goes up to around 50w before dropping back down to 10w after a short while:

Can also see that it goes to P2 while it is processing, but then drops back to P8 when idle. So not sure if this may be a driver issue, or maybe as others suggested some conflict with Xorg or even your Proxmox Host OS? Here's how my setup looks for Proxmox:

Note I have integrated graphics on this server in addition to the GPU, and I ensured and drivers for the NVIDIA GPU are blacklisted from the proxmox host OS so it can not utilize that GPU as well as setting bios to use the integrated graphics as the primary GPU.

Hope this info helps!

obones Jun 13, 2023
Author

Still no change after having excluded Xorg from the P4.
Basically, I start WIS, do one RTC recognition and then let it sit.
The GPU stays in P0 state, consuming 32W until it reaches 93C and then stops.

This is all on a "bare metal" installation with no virtualization or anything.

obones Oct 13, 2023
Author

@Japi42 : were you able to solve this?

hamishcunningham · 2023-06-13T10:11:24Z

hamishcunningham
Jun 13, 2023

IIRR I also saw freezes when sharing the GPU with X, and had to disallow that.

12 replies

nikito Jun 13, 2023

May want to try it while WIS is running (if the above isn't showing that); The stats I gave were while it was running. I'm just curious to see if it will go back to the idle throttle state, or if it gets stuck in some other state after WIS runs and processes data. If so, that may explain at least some of the mystery 😄
One thing I do notice is you appear to have application clocks set. I do not have these, and I am wondering if those clocks are what the card defaults to when any app is running, even if it is idle?
Here's my output for reference:

EDIT: Also looks like your app clocks peg your memory at max frequency, I wonder if that is what contributes to your high thermals after the app comes up?

obones Jun 13, 2023
Author

Ok, same results while WIS is running:

    Fan Speed                             : N/A
    Performance State                     : P0
    Temperature
        GPU Current Temp                  : 61 C
        GPU Shutdown Temp                 : 94 C
        GPU Slowdown Temp                 : 91 C
        GPU Max Operating Temp            : N/A
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 27.53 W
        Power Limit                       : 75.00 W
        Default Power Limit               : 75.00 W
        Enforced Power Limit              : 75.00 W
        Min Power Limit                   : 60.00 W
        Max Power Limit                   : 75.00 W
    Clocks
        Graphics                          : 1113 MHz
        SM                                : 1113 MHz
        Memory                            : 2999 MHz
        Video                             : 999 MHz
    Applications Clocks
        Graphics                          : 885 MHz
        Memory                            : 3003 MHz
    Default Applications Clocks
        Graphics                          : 885 MHz
        Memory                            : 3003 MHz
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 1531 MHz
        SM                                : 1531 MHz
        Memory                            : 3003 MHz
        Video                             : 1379 MHz
    Max Customer Boost Clocks
        Graphics                          : 1113 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A

And if I stop WIS, it all goes back to the state I have shown before.

As to application clocks, they are at their default values, using nvidia-smi -rac does not change them.

nikito Jun 13, 2023

What do your Throttle Reasons look like?

obones Jun 13, 2023
Author

They stayed the same during WIS run:

    Clocks Throttle Reasons
        Idle                              : Not Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active

nikito Jun 13, 2023

What's interesting is it isn't going back to the Idle reason, but also don't see it flagging app clocks or anything else like that. Not really sure what else could be checked 😞

kristiankielhofner · 2023-06-13T15:26:42Z

kristiankielhofner
Jun 13, 2023
Maintainer

Sorry, I was out for a long weekend. This is a great thread!

We have several devs using P4s with WIS and we haven't observed this "stuck in P0" behavior. However, as @nikito says running Xorg on a Tesla P4 (it doesn't even have video out) is very strange and I have no idea what happens in this scenario.

Generally we've observed identical behavior @nikito has observed with the GTX 1070 on the Tesla P4 - idle low with models loaded, brief spike (1 sec or less) for requests, idle dropping back down for a short period of time, then idle dropping back down to the starting WIS + models level.

Generally, this is default behavior that we can modify via nvidia-smi, etc for WIS but the defaults work fairly well across GPUs in our testing. FYI the higher idle for a time period post initial spike is for higher performance and lower latency on subsequent follow-up requests, so things like:

Initial ASR request
TTS output
Potential additional follow-up requests (ASR, TTS, or both)

We've had some skepticism with suggesting the Tesla P4 with passive cooling but in our testing and even very high Willow use we haven't observed anything close to thermal throttling - the model execution time is so fast the card never even gets close to thermal limits. On GTX 1070s all the way up to RTX 4090s we don't even see the fans start up (which is pretty neat).

7 replies

obones Jun 20, 2023
Author

We have several devs using P4s with WIS and we haven't observed this "stuck in P0" behavior.

@kristiankielhofner, can you check with them that's really the case? Because if the entire machine is well ventilated, the lock up may not occur. And also, can you ask them which driver version they are using?

Could try to spin up another CUDA based docker container and see if it behaves similarly?

I have tried with the Frigate NVR docker image and I'm seeing the same behavior, so I'm starting to believe there is an issue either with the card or with my setup.

kristiankielhofner Jun 20, 2023
Maintainer

We have two WIS developers who use them exclusively. I've checked with them on this issue and neither of them are seeing it. I believe they're running driver version 525 (we recommend either 525 or preferably 530 for future compatibility).

The thermal/cooling issues with the Tesla P4 (while otherwise valid) are a non-concern with even fairly high-scale WIS usage. Inference sessions are so small and quick with short speech segments for voice commands the card never gets a chance to even approach thermal limits - in the worst of cooling setups. I've noted before for other Pascal based cards with higher TDP (such as the GTX 1070) the fans don't even start with WIS. I can't speak for Frigate usage as I don't use it but from what I understand load should be extremely low there as well.

It seems as though you may be experiencing some issue with your specific configuration or card as you suspect.

lordratner Jun 20, 2023

With Frigate you would have constant use on the card, since the GPU is used for encoding.

Here's nvidia-smi run twice, first with Frigate (three cameras) and WIS running, then with just WIS. You can see the different power state between the two, as well as the usage.

Are you positive nothing else is using the card? If you shut WIS down and have nothing at all using the card, what does nvidia-smi show?

obones Jun 20, 2023
Author

Yes, I see this on your capture, but in my case, it's always P0 with 25W power at least.
When nothing is running, nvidia-smi shows no processes and P8 power state (around 7W)

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 530.41.03    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P4                        Off| 00000000:01:00.0 Off |                    0 |
| N/A   41C    P8                7W /  75W|      2MiB /  7680MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

obones Dec 1, 2023
Author

I took advantage of a Black Friday special and got a GeForce RTX 3060 installed instead of the P4 and I'm now observing this: the 3060 throttles back to P2 then P8 while the Tesla P4 stays at P0 and locks down for thermal overload.

It seems there definitely is an issue with the Tesla P4 but I'm in no position to find out what it is.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Total system freeze after a while #87

{{title}}

Replies: 3 comments 28 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Total system freeze after a while #87

Replies: 3 comments · 28 replies

obones Jun 13, 2023 Author

obones Jun 13, 2023 Author

obones Oct 13, 2023 Author

obones Jun 13, 2023 Author

obones Jun 13, 2023 Author

kristiankielhofner Jun 13, 2023 Maintainer

obones Jun 20, 2023 Author

kristiankielhofner Jun 20, 2023 Maintainer

obones Jun 20, 2023 Author

obones Dec 1, 2023 Author

Replies: 3 comments 28 replies

obones Jun 13, 2023
Author

obones Jun 13, 2023
Author

obones Oct 13, 2023
Author

obones Jun 13, 2023
Author

obones Jun 13, 2023
Author

kristiankielhofner
Jun 13, 2023
Maintainer

obones Jun 20, 2023
Author

kristiankielhofner Jun 20, 2023
Maintainer

obones Jun 20, 2023
Author

obones Dec 1, 2023
Author