-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6800 crashing after applying DPM3 settings #21
Comments
I also tried lowering the mem clock by 1 mhz which resulted in following
Probably not big enough change to trigger a change. |
Interesting with some 6800 test! I am not the maintainer of the Arch AUR package, which is a bit outdated, and support for the 6000 series is only available in this experimental branch. It also requires the latest version of UPP (not available at pip yet). If you have already installed UPP via pip you could do a quick-and-hacky update by overwriting the files in the upp lib folder (for example Edit: sorry, I didn't read you initial post properly, is UPP also from the current git repo? |
Hello and also did bit more testing. Only the mem settings seem to be the problem. I was able to use the static voltage / Graphics card power ( seem to be limited to 257, more testing to do ) and Gfx clock frequency. |
Sounds promising! I don't know if you attempted to set the DPM 0 frequency, that will likely cause the GPU to freak out. I would expect at least the DPM 3 frequency to be adjustable though. |
I was doing it to DPM 3. Notes this far:
|
Technically it's from my own branch (based on master), only added one comma thought. |
More testing done and things seem promising. I was able to confirm that the oc works.
|
I guess the next step for my oc would be to flash a 6800 XT bios or something. Anything you would like me to test? |
Many thanks for the report! As for the power limit it's been tricky to set with the navi 10 cards as well, only working well with certain firmware/kernel version combinations. Are the changes that you make reflected in Gfx clock is a pity if it's not possible to adjust, it is unfortunately not unlikely that the card is "hard limited" like the 5600 XT was. |
it is reflected there too.
And regarding the Memory settings, I tried again. It still crashes after few seconds without changing any settings. ( at least what powerupp and /sys/kernel/debug/dri/$index/amdgpu_pm_info tell me) I tried both lowering and increasing the DPM3 Clock frequency, both seem to make this occur. |
Can you try to set Do you know if and to what extent the memory is possible to adjust in Windows/Radeon Software? |
Also try to set only the DPM 3 clock (1020 MHz in the example below) using UPP to confirm that the same thing happens and that it's not something buggy in powerupp |
Hello I already have the featuremask.
After adjusting voltage offset -115 and setting power limit to 233 below seems ineffective. ( don't know the magic strings so have to do from powerupp)
no errors, but seems to be ineffective. The core clock is still stuck to 2450. For the DPM3
seems to be also ineffective. Other notes
|
|
Setting it twice? in a row makes following to happen.
On journalctl side following happens. The line about fan speed keeps repeating forever 4x in 1s.
And amdgpu_pm_info changes to following
Missing all the clock, temp etc information. The system also becomes unresponsive occasionally. |
The 6800 test file that I have has a fan target temperature of 80 degrees, not sure if it's the same model that you have but seems likely that they have the same target at least. You could try to increase it but be careful not to overheat the card What monitor frequency are you running at? Dual monitors? Can you try to change the frequency and maybe set to single monitor and change connection (HDMI/DP) if possible and see if it makes any difference for the memory clock setting. The repeating error messages can be caused by powerupps (or other application) periodical hwmon readings. |
Main screen 240hz 1440p, secondary monitor 60hz 1080p both DP
edit: also I am now running kernel 5.10.1-1 with few patches ( hopefully irrelevant ). And that didn't improve the situation.
|
Sadly even with this max I can get is 2475 mhz
|
Setting mem clock on my RX5700 is also very flaky, it would only accept certain values and crash with most others. Often, a difference of just one MHz would result the card to crash (mostly unrecoverable, needs HW reset), and it does not matter if you increase or decrease the clock. And also, it does not matter if you use upp (pp_table interface) or radeon-clocks (kernel sysfs API) to change these clocks, it is something in firmware/SMU/RAM timings that makes it crash. I had to determine a certain set of "safe" clocks by trial and error :| |
As for now the 5.10 kernel seems to at least break the power limit setting possibility for Navi 10 for some reason, still need to do some further digging to understand why this happens and if there are any workarounds. But since you also tried 5.9 that shouldn't be the main culprit. There might still be driver/firmware issues that AMD will sort out eventually. |
Are you able to lower the power limit by the way (set it to 150 W for example, using powerupp)? |
I'm dropping in since I'm testing on 6800XT. PowerUPP reads all values correctly but nothing changes when trying to apply something new.
PowerUPP does ask for permissions when applying so I wouldn't think it's a problem with permissions. Glad to help with debugging, not really familiar with the interfaces so can't do it by myself. |
Thanks, can you try to run PowerUPP from terminal ( |
I skimmed through some forums and it appears to be difficult to adjust the memory clock at least on high refresh rate monitors (note that the memory clocks are reported differently under Windows and Linux, I believe the values are halved under Linux). |
That was helpful, I ran PowerUPP on terminal and got following: After trying the command with sudo it worked and lowered the voltage and changed power limit. Any ideas how to get it work, root not a user of some group? And would probably be useful to insert some sort of error message in the GUI if this happens. edit: changing gfx frequency doesn't seem to work, if I raise it the gpu doesn't boost anymore. But this is probably driver issue. |
It would appear that root is not allowed to issue sudo commands? Maybe try something like this?
Absolutely. I just pushed a new commit to the bignavi branch, before you fix the cause let me know if this works as intended please.
It seems so unfortunately, hopefully something that will get sorted out. |
Setting it to 150W seems to work just fine and it's also reflected in the power draw. However when trying to go back to 233 W I got the following.
:D |
Interesting. For navi 10 this error message only seems to appear under kernel 5.10 and when not using the Probably not important but noteable is that contrary to navi 10 it seems to properly calculate the max allowed (150 + 15%), with navi 10 it would have said "max allowed 150" (the actual max value set in the powerplay table). The message (at least for navi 10) is not triggered when setting the powerplay table values but when applying the value to sysfs. Can you set the power cap manually |
That was it, that is commented by default in arch sudo package. Thanks. |
Below test is without the flag
Appears still to be broken
and on the syslog
|
Some more findings. After updating to 5.10.2 + linux-firmware-git 20201218.646f159-1 + and rebuilding upp and powerupp. I can now change the power limit and it seems to really change too. Still weird behaviour where I have to slowly bring it up step by step, but I am now seeing power usage > 250 W every now and then. The main limiting factor GFX clock frequency (2475) still remains. |
Perfect! I've noticed some similar things recently with my 5700 XT as well, the other day setting the power limit had no effect and today it works all of a sudden (not quite sure if/what I updated...) but sometimes crashes after setting it (never had that happen before). If the Gfx clock is a driver/firmware issue it will hopefully be fixed soon as well. @asiantuntija have you tried to change the memory DPM 3 clock and does it cause a crash for you as well? |
Remarks regarding amdgpu.ppfeaturemask=0xffffffff
And DPM3 settings still do the following behaviour even with all the updates
|
Yes, I tried it a couple of times and having same symptoms as @quasd. I remember having same sort of crashes with Navi10 when it was fresh, if I had radeon-profile-daemon running on the background it would freeze pc shortly after booting, IIRC it had something to do with race condition in requesting information from the GPU. I will try a bit more with memory tomorrow, rebooted million times today already because one of my RAM sticks died on me. Seems like fixed undervolting is the way to go, getting stable 2575MHz (max possible) with -150mV, Unigine Superposition score went from stock 9119 to 9619, and lower temperatures as well. |
I got my hands on 6800 XT and it also seems to act in the same way as the 6800. Same problems seems to occur also on 6800 XT. The max core frequency is now 2577 mhz compared to 2475 mhz. Does anyone know what is currently the limiting/bugging part that stops us from setting higher frequencies? AFAIK these cards should be limited to 2.7ghz not 2.58ghz. |
Did bit of poking around and found this Now on 5.11 rc1 + few patches form drm-next and I have working memory and core frequency control through pp_od_clk_voltage ( up to 2.8ghz/1075mhz) + voltage offset.
Powerupp/upp still don't work with memory clock or increasing the core frequency. |
Hello brother, can you please share the patches please. Best regards. |
5.11 rc kernel and these 4 https://cgit.freedesktop.org/~agd5f/linux/commit/?id=78d907e2b8ba89c936b7f0c3344261c653668a62
|
Thanks for your reply. |
|
thank you very much, it worked prefectly. |
sorry to bother you here but echo "vo -10" > /sys/class/drm/card0/device/pp_od_clk_voltage is like a decrease of -10mv ? when i run sudo watch sensors, the voltage shows 1150mv for the gpu. Could you plz explain, and thanks so much for sharing your knowledge. |
yes
yes afaik |
This allowed me to at least get above the intial 1000 hurdle. However once I have executed these commands once I am not able to execute again. It says invalid argument. So I now am running at 1075 which is better then stock at least. I am not sure on the programming side of things what the challenges are but wanted to confirm that for me at least this allowed the overclock. Although Powerupp still does not make any changes to these settings for me. |
After getting upp working I ended up finding this project and branch.
I am trying this with
https://aur.archlinux.org/packages/powerupp-git/
by modifying the branch to bignavi.
The GUI opens up just fine and I can load active settings. But if I click "Apply current" I get following errors from kernel, and the settings don't appear to change. (reading from /sys/kernel/debug/dri/$index/amdgpu_pm_info)
If I launch a game, I get a freeze fairly quickly.
Kernel in use
linux-firmware in use
Is this a problem of running on 5.9?
The text was updated successfully, but these errors were encountered: