6800 crashing after applying DPM3 settings #21

quasd · 2020-12-20T17:23:05Z

After getting upp working I ended up finding this project and branch.

I am trying this with
https://aur.archlinux.org/packages/powerupp-git/
by modifying the branch to bignavi.

The GUI opens up just fine and I can load active settings. But if I click "Apply current" I get following errors from kernel, and the settings don't appear to change. (reading from /sys/kernel/debug/dri/$index/amdgpu_pm_info)

Dec 20 19:06:26 quasd kernel: amdgpu 0000:0c:00.0: amdgpu: smu driver if version = 0x00000034, smu fw if version = 0x0000003b, smu fw version = 0x003a3100 (58.49.0)
Dec 20 19:06:26 quasd kernel: amdgpu 0000:0c:00.0: amdgpu: SMU driver if version not matched
Dec 20 19:06:26 quasd kernel: amdgpu 0000:0c:00.0: amdgpu: use vbios provided pptable
Dec 20 19:06:28 quasd kernel: amdgpu 0000:0c:00.0: amdgpu: failed send message: TransferTableDram2Smu (19)         param: 0x00000000 response 0xffffffc2
Dec 20 19:06:28 quasd kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to transfer pptable to SMC!
Dec 20 19:06:28 quasd kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to setup smc hw!
Dec 20 19:06:28 quasd kernel: amdgpu 0000:0c:00.0: amdgpu: smu reset failed, ret = -62

If I launch a game, I get a freeze fairly quickly.

Kernel in use

Linux quasd 5.9.14-1-ck #1 SMP PREEMPT Fri, 18 Dec 2020 06:58:44 +0000 x86_64 GNU/Linux

linux-firmware in use

linux-firmware-git 20201130.7455a36-1

Is this a problem of running on 5.9?

The text was updated successfully, but these errors were encountered:

quasd · 2020-12-20T17:29:33Z

I also tried lowering the mem clock by 1 mhz which resulted in following

Dec 20 19:23:25 eki-ryzen kernel: amdgpu 0000:0c:00.0: amdgpu: smu driver if version = 0x00000034, smu fw if version = 0x0000003b, smu fw version = 0x003a3100 (58.49.0)
Dec 20 19:23:25 eki-ryzen kernel: amdgpu 0000:0c:00.0: amdgpu: SMU driver if version not matched
Dec 20 19:23:25 eki-ryzen kernel: amdgpu 0000:0c:00.0: amdgpu: use vbios provided pptable
Dec 20 19:23:25 eki-ryzen kernel: amdgpu 0000:0c:00.0: amdgpu: SMU is initialized successfully!

Probably not big enough change to trigger a change.

azeam · 2020-12-20T17:37:29Z

Interesting with some 6800 test! I am not the maintainer of the Arch AUR package, which is a bit outdated, and support for the 6000 series is only available in this experimental branch. It also requires the latest version of UPP (not available at pip yet).

If you have already installed UPP via pip you could do a quick-and-hacky update by overwriting the files in the upp lib folder (for example ~/.local/lib/python3.8/site-packages/upp) with the latest files from github, otherwise install it from source and make sure the upp command is available and runs the latest version. For powerupp download the bignavi branch and do make && sudo make install.

Edit: sorry, I didn't read you initial post properly, is UPP also from the current git repo?

quasd · 2020-12-20T17:49:10Z

Hello

and also did bit more testing. Only the mem settings seem to be the problem. I was able to use the static voltage / Graphics card power ( seem to be limited to 257, more testing to do ) and Gfx clock frequency.

azeam · 2020-12-20T18:00:16Z

Sounds promising! I don't know if you attempted to set the DPM 0 frequency, that will likely cause the GPU to freak out. I would expect at least the DPM 3 frequency to be adjustable though.

quasd · 2020-12-20T18:11:06Z

I was doing it to DPM 3.

Here is how it looks to me.

Notes this far:

Graphic card power seems to be bugged, as long as I set it to maximum allowed, I can keep increasing it. I think the default is 233, and in the screenshot it is 439.
Can't increase gpu mhz past 2475 ( will drop down to 2d clocks )
With above settings I am pretty much locked to 2450 mhz

quasd · 2020-12-20T18:18:58Z

is UPP also from the current git repo?

Technically it's from my own branch (based on master), only added one comma thought.

quasd · 2020-12-20T18:23:16Z

More testing done and things seem promising. I was able to confirm that the oc works.

Just simple testing and staring at the wall. 10fps increase from stock.
The power limit seems to be still locked to 233 even though I can increase it endlessly
Biggest limitation is Gfx clock frequency, if this can't be changed there won't be much point putting this card on water :(

quasd · 2020-12-20T18:25:36Z

I guess the next step for my oc would be to flash a 6800 XT bios or something.

Anything you would like me to test?

azeam · 2020-12-20T18:48:36Z

Many thanks for the report! As for the power limit it's been tricky to set with the navi 10 cards as well, only working well with certain firmware/kernel version combinations. Are the changes that you make reflected in cat /sys/class/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/power1_cap ?

Gfx clock is a pity if it's not possible to adjust, it is unfortunately not unlikely that the card is "hard limited" like the 5600 XT was.

quasd · 2020-12-20T18:55:27Z

Many thanks for the report! As for the power limit it's been tricky to set with the navi 10 cards as well, only working well with certain firmware/kernel version combinations. Are the changes that you make reflected in cat /sys/class/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/power1_cap ?

it is reflected there too.

 cat /sys/class/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/power1_cap
360000000

And regarding the Memory settings, I tried again. It still crashes after few seconds without changing any settings. ( at least what powerupp and /sys/kernel/debug/dri/$index/amdgpu_pm_info tell me) I tried both lowering and increasing the DPM3 Clock frequency, both seem to make this occur.

azeam · 2020-12-20T19:16:37Z

Can you try to set upp set smc_pptable/DcModeMaxFreq/0=2500 smc_pptable/DcModeMaxFreq/2=1050 --write and see if that makes any difference to setting the Gfx and DPM 3 frequencies (should raise the limits used by OD to 2500/1050 MHz)? You could also see if turning the amdgpu.ppfeaturemask=0xffffffff boot flag on or off makes any difference.

Do you know if and to what extent the memory is possible to adjust in Windows/Radeon Software?

azeam · 2020-12-20T19:35:46Z

Also try to set only the DPM 3 clock (1020 MHz in the example below) using UPP to confirm that the same thing happens and that it's not something buggy in powerupp upp set smc_pptable/FreqTableUclk/3=1020 --write.

quasd · 2020-12-20T19:49:57Z

Hello

I already have the featuremask.

[root@quasd ~]# grep -o amdgpu.ppfeaturemask=0xffffffff /proc/cmdline 
amdgpu.ppfeaturemask=0xffffffff

After adjusting voltage offset -115 and setting power limit to 233 below seems ineffective. ( don't know the magic strings so have to do from powerupp)

[root@quasd ~]# upp set smc_pptable/DcModeMaxFreq/0=2560 --write
Changing smc_pptable.DcModeMaxFreq.0 from 2460 to 2560 at 0x626
Commiting changes to '/sys/class/drm/card0/device/pp_table'.

no errors, but seems to be ineffective. The core clock is still stuck to 2450.

For the DPM3

upp set smc_pptable/DcModeMaxFreq/2=1050 --write

seems to be also ineffective.

Other notes

Temps are around 80c, is this some magic point where it stops boosting?
Power usage is jumping around 200 W
Testing with overwatch and practice range
Setting fan speed to 100% and waiting for the card to cool down and retrying allowed me to reach 2470

edit: typos

quasd · 2020-12-20T19:52:51Z

Also try to set only the DPM 3 clock (1020 MHz in the example below) using UPP to confirm that the same thing happens and that it's not something buggy in powerupp upp set smc_pptable/FreqTableUclk/3=1020 --write.

[root@quasd ~]# upp set smc_pptable/FreqTableUclk/3=1020 --write
Changing smc_pptable.FreqTableUclk.3 from 1000 to 1020 at 0x584
Commiting changes to '/sys/class/drm/card0/device/pp_table'.
[root@quasd ~]#

quasd · 2020-12-20T20:00:35Z

Setting it twice? in a row makes following to happen.

[root@quasd ~]# upp set smc_pptable/FreqTableUclk/3=1020 --write
Changing smc_pptable.FreqTableUclk.3 from 1020 to 1020 at 0x584
Commiting changes to '/sys/class/drm/card0/device/pp_table'.
Traceback (most recent call last):
  File "/usr/bin/upp", line 33, in <module>
    sys.exit(load_entry_point('upp==0.0.7.post2', 'console_scripts', 'upp')())
  File "/usr/lib/python3.9/site-packages/upp/upp.py", line 336, in main
    cli(obj={})()
  File "/usr/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/upp/upp.py", line 318, in set
    decode._write_pp_tables_file(pp_file, pp_bytes)
  File "/usr/lib/python3.9/site-packages/upp/decode.py", line 47, in _write_pp_tables_file
    f.close()
OSError: [Errno 62] Timer expired
[root@quasd ~]#

On journalctl side following happens. The line about fan speed keeps repeating forever 4x in 1s.

Dec 20 21:53:08 quasd kernel: amdgpu 0000:0c:00.0: amdgpu: failed send message: TransferTableDram2Smu (19)         param: 0x00000000 response 0xffffffc2
Dec 20 21:53:08 quasd kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to transfer pptable to SMC!
Dec 20 21:53:08 quasd kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to setup smc hw!
Dec 20 21:53:08 quasd kernel: amdgpu 0000:0c:00.0: amdgpu: smu reset failed, ret = -62
Dec 20 21:53:08 quasd kernel: amdgpu: manual fan speed control should be enabled first

And amdgpu_pm_info changes to following

[root@quasd ~]# cat /sys/kernel/debug/dri/0/amdgpu_pm_info 
Clock Gating Flags Mask: 0x38118305
	Graphics Medium Grain Clock Gating: On
	Graphics Medium Grain memory Light Sleep: Off
	Graphics Coarse Grain Clock Gating: On
	Graphics Coarse Grain memory Light Sleep: Off
	Graphics Coarse Grain Tree Shader Clock Gating: Off
	Graphics Coarse Grain Tree Shader Light Sleep: Off
	Graphics Command Processor Light Sleep: Off
	Graphics Run List Controller Light Sleep: Off
	Graphics 3D Coarse Grain Clock Gating: On
	Graphics 3D Coarse Grain memory Light Sleep: Off
	Memory Controller Light Sleep: On
	Memory Controller Medium Grain Clock Gating: On
	System Direct Memory Access Light Sleep: Off
	System Direct Memory Access Medium Grain Clock Gating: Off
	Bus Interface Medium Grain Clock Gating: Off
	Bus Interface Light Sleep: Off
	Unified Video Decoder Medium Grain Clock Gating: Off
	Video Compression Engine Medium Grain Clock Gating: Off
	Host Data Path Light Sleep: On
	Host Data Path Medium Grain Clock Gating: On
	Digital Right Management Medium Grain Clock Gating: Off
	Digital Right Management Light Sleep: Off
	Rom Medium Grain Clock Gating: Off
	Data Fabric Medium Grain Clock Gating: Off
	Address Translation Hub Medium Grain Clock Gating: On
	Address Translation Hub Light Sleep: On

dpm not enabled
[root@quasd ~]#

Missing all the clock, temp etc information. The system also becomes unresponsive occasionally.
edit: remove speculation of what might be the cause

azeam · 2020-12-20T20:30:00Z

Temps are around 80c, is this some magic point where it stops boosting?

The 6800 test file that I have has a fan target temperature of 80 degrees, not sure if it's the same model that you have but seems likely that they have the same target at least. You could try to increase it but be careful not to overheat the card upp set smc_pptable/FanTargetTemperature=85 --write

What monitor frequency are you running at? Dual monitors? Can you try to change the frequency and maybe set to single monitor and change connection (HDMI/DP) if possible and see if it makes any difference for the memory clock setting. The repeating error messages can be caused by powerupps (or other application) periodical hwmon readings.

quasd · 2020-12-20T20:38:07Z

What monitor frequency are you running at? Dual monitors? Can you try to change the frequency and maybe set to single monitor and change connection (HDMI/DP) if possible and see if it makes any difference for the memory clock setting. The repeating error messages can be caused by powerupps (or other application) periodical hwmon readings.

Main screen 240hz 1440p, secondary monitor 60hz 1080p both DP
Tried with only 1 screen hdmi. Memory setting still didn't work. And setting memory more than 1 results in instability/freezes.

upp set smc_pptable/FreqTableUclk/3=1020 --write

edit: also I am now running kernel 5.10.1-1 with few patches ( hopefully irrelevant ). And that didn't improve the situation.

  "enable_additional_cpu_optimizations-$_gcc_more_v.tar.gz::https://github.com/graysky2/kernel_gcc_patch/archive/$_gcc_more_v.tar.gz"
  0015-zfs.patch
  "0001-futex-patches.patch::https://raw.githubusercontent.com/Frogging-Family/linux-tkg/master/linux59-tkg/linux59-tkg-patches/0007-v5.9-fsync.patch"
  0001-ZEN-Add-sysctl-and-CONFIG-to-disallow-unprivileged-C.patch
  0002-Bluetooth-Fix-LL-PRivacy-BLE-device-fails-to-connect.patch
  0003-Bluetooth-Fix-attempting-to-set-RPA-timeout-when-uns.patch
  0004-HID-quirks-Add-Apple-Magic-Trackpad-2-to-hid_have_sp.patch

quasd · 2020-12-20T20:55:56Z

upp set smc_pptable/FanTargetTemperature=85 --write

Sadly even with this max I can get is 2475 mhz

root@quasd ~# upp set smc_pptable/DcModeMaxFreq/0=2550 --write
Changing smc_pptable.DcModeMaxFreq.0 from 2475 to 2550 at 0x626
Commiting changes to '/sys/class/drm/card0/device/pp_table'.
root@quasd ~# upp set smc_pptable/FanTargetTemperature=90 --write
Changing smc_pptable.FanTargetTemperature from 80 to 90 at 0x720
Commiting changes to '/sys/class/drm/card0/device/pp_table'.
root@quasd ~#

sibradzic · 2020-12-21T03:59:31Z

Setting mem clock on my RX5700 is also very flaky, it would only accept certain values and crash with most others. Often, a difference of just one MHz would result the card to crash (mostly unrecoverable, needs HW reset), and it does not matter if you increase or decrease the clock. And also, it does not matter if you use upp (pp_table interface) or radeon-clocks (kernel sysfs API) to change these clocks, it is something in firmware/SMU/RAM timings that makes it crash.

I had to determine a certain set of "safe" clocks by trial and error :|

azeam · 2020-12-21T09:43:17Z

edit: also I am now running kernel 5.10.1-1 with few patches ( hopefully irrelevant ). And that didn't improve the situation.

As for now the 5.10 kernel seems to at least break the power limit setting possibility for Navi 10 for some reason, still need to do some further digging to understand why this happens and if there are any workarounds. But since you also tried 5.9 that shouldn't be the main culprit. There might still be driver/firmware issues that AMD will sort out eventually.

azeam · 2020-12-21T14:14:41Z

Are you able to lower the power limit by the way (set it to 150 W for example, using powerupp)?

asiantuntija · 2020-12-21T14:41:10Z

I'm dropping in since I'm testing on 6800XT. PowerUPP reads all values correctly but nothing changes when trying to apply something new.

cat /sys/class/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/power1_cap doesn't show change when applying something with PowerUPP, echo 293000000 | sudo tee /sys/class/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/power1_cap however does work and sets the power limit to maximum. I also tried undervolting heavily but nothing seemed to crash the system so that probably isn't working either.

PowerUPP does ask for permissions when applying so I wouldn't think it's a problem with permissions. Glad to help with debugging, not really familiar with the interfaces so can't do it by myself.

azeam · 2020-12-21T15:02:31Z

I'm dropping in since I'm testing on 6800XT. PowerUPP reads all values correctly but nothing changes when trying to apply something new.

cat /sys/class/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/power1_cap doesn't show change when applying something with PowerUPP, echo 293000000 | sudo tee /sys/class/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/power1_cap however does work and sets the power limit to maximum. I also tried undervolting heavily but nothing seemed to crash the system so that probably isn't working either.

PowerUPP does ask for permissions when applying so I wouldn't think it's a problem with permissions. Glad to help with debugging, not really familiar with the interfaces so can't do it by myself.

Thanks, can you try to run PowerUPP from terminal (powerupp) while setting the values and see if anything strange shows there?
Getting any errors with dmesg | grep amdgpu?

azeam · 2020-12-21T15:13:32Z

Main screen 240hz 1440p, secondary monitor 60hz 1080p both DP
Tried with only 1 screen hdmi. Memory setting still didn't work. And setting memory more than 1 results in instability/freezes.

I skimmed through some forums and it appears to be difficult to adjust the memory clock at least on high refresh rate monitors (note that the memory clocks are reported differently under Windows and Linux, I believe the values are halved under Linux).

asiantuntija · 2020-12-21T15:42:39Z

I'm dropping in since I'm testing on 6800XT. PowerUPP reads all values correctly but nothing changes when trying to apply something new.
cat /sys/class/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/power1_cap doesn't show change when applying something with PowerUPP, echo 293000000 | sudo tee /sys/class/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/power1_cap however does work and sets the power limit to maximum. I also tried undervolting heavily but nothing seemed to crash the system so that probably isn't working either.
PowerUPP does ask for permissions when applying so I wouldn't think it's a problem with permissions. Glad to help with debugging, not really familiar with the interfaces so can't do it by myself.

Thanks, can you try to run PowerUPP from terminal (powerupp) while setting the values and see if anything strange shows there?
Getting any errors with dmesg | grep amdgpu?

That was helpful, I ran PowerUPP on terminal and got following:
Sorry, user root is not allowed to execute '/bin/zsh -c /usr/bin/upp --pp-file /sys/class/drm/card0/device/pp_table set --write smc_pptable/MaxVoltageGfx=4000 smc_pptable/SocketPowerLimitAc/0=293 smc_pptable/FreqTableGfx/1=2577 smc_pptable/MemMvddVoltage/3=5400 smc_pptable/MemVddciVoltage/3=3400 smc_pptable/FreqTableUclk/3=1000 smc_pptable/MaxVoltageSoc=4600 smc_pptable/FreqTableSocclk/1=1200 smc_pptable/qStaticVoltageOffset/0/c=0.000000 smc_pptable/MemMvddVoltage/0=5000 smc_pptable/MemVddciVoltage/0=2700 smc_pptable/FreqTableUclk/0=97 smc_pptable/MemMvddVoltage/1=5400 smc_pptable/MemVddciVoltage/1=3200 smc_pptable/FreqTableUclk/1=457 smc_pptable/MemMvddVoltage/2=5400 smc_pptable/MemVddciVoltage/2=3400 smc_pptable/FreqTableUclk/2=674 smc_pptable/MinVoltageGfx=3524 smc_pptable/MinVoltageSoc=3800' as user on arch-pc.
Sorry, user root is not allowed to execute '/usr/sbin/tee /sys/class/hwmon/hwmon3/power1_cap' as root on arch-pc.

After trying the command with sudo it worked and lowered the voltage and changed power limit. Any ideas how to get it work, root not a user of some group? And would probably be useful to insert some sort of error message in the GUI if this happens.

edit: changing gfx frequency doesn't seem to work, if I raise it the gpu doesn't boost anymore. But this is probably driver issue.

azeam · 2020-12-21T16:55:50Z

After trying the command with sudo it worked and lowered the voltage and changed power limit. Any ideas how to get it work, root not a user of some group?

It would appear that root is not allowed to issue sudo commands? Maybe try something like this?

And would probably be useful to insert some sort of error message in the GUI if this happens.

Absolutely. I just pushed a new commit to the bignavi branch, before you fix the cause let me know if this works as intended please.

edit: changing gfx frequency doesn't seem to work, if I raise it the gpu doesn't boost anymore. But this is probably driver issue.

It seems so unfortunately, hopefully something that will get sorted out.

quasd · 2020-12-21T17:29:05Z

Are you able to lower the power limit by the way (set it to 150 W for example, using powerupp)?

Setting it to 150W seems to work just fine and it's also reflected in the power draw.

However when trying to go back to 233 W I got the following.

Dec 21 19:27:00 eki-ryzen kernel: amdgpu 0000:0c:00.0: amdgpu: New power limit (233) is over the max allowed 172

:D

azeam · 2020-12-21T18:13:13Z

However when trying to go back to 233 W I got the following.
Dec 21 19:27:00 eki-ryzen kernel: amdgpu 0000:0c:00.0: amdgpu: New power limit (233) is over the max allowed 172

Interesting. For navi 10 this error message only seems to appear under kernel 5.10 and when not using the amdgpu.ppfeaturemask=0xffffffff flag, do you still have it set?

Probably not important but noteable is that contrary to navi 10 it seems to properly calculate the max allowed (150 + 15%), with navi 10 it would have said "max allowed 150" (the actual max value set in the powerplay table). The message (at least for navi 10) is not triggered when setting the powerplay table values but when applying the value to sysfs.

Can you set the power cap manually echo 233000000 | sudo tee /sys/class/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/power1_cap after getting this error? If it's possible it would seem like there's a timing issue (PowerUPP tries to set the sysfs value before the re-initialization of the powerplay table is complete), which would be fixable.

asiantuntija · 2020-12-21T19:16:14Z

After trying the command with sudo it worked and lowered the voltage and changed power limit. Any ideas how to get it work, root not a user of some group?

It would appear that root is not allowed to issue sudo commands? Maybe try something like this?

That was it, that is commented by default in arch sudo package. Thanks.

quasd · 2020-12-21T20:13:38Z

Interesting. For navi 10 this error message only seems to appear under kernel 5.10 and when not using the amdgpu.ppfeaturemask=0xffffffff flag, do you still have it set?

Below test is without the flag

[root@quasd ~]# cat /proc/cmdline | grep amd
[root@quasd ~]#

Can you set the power cap manually echo 233000000 | sudo tee /sys/class/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/power1_cap after getting this error? If it's possible it would seem like there's a timing issue (PowerUPP tries to set the sysfs value before the re-initialization of the powerplay table is complete), which would be fixable.

Appears still to be broken

[root@quasd ~]# echo 233000000 | sudo tee /sys/class/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/power1_cap
233000000
[root@quasd ~]# cat /sys/class/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/power1_cap
150000000
[root@quasd ~]#

and on the syslog

Dec 21 22:10:12 quasd kernel: amdgpu 0000:0c:00.0: amdgpu: New power limit (233) is over the max allowed 150

quasd · 2020-12-21T20:28:28Z

Some more findings. After updating to 5.10.2 + linux-firmware-git 20201218.646f159-1 + and rebuilding upp and powerupp.

I can now change the power limit and it seems to really change too. Still weird behaviour where I have to slowly bring it up step by step, but I am now seeing power usage > 250 W every now and then.

The main limiting factor GFX clock frequency (2475) still remains.

azeam · 2020-12-21T20:35:51Z

Some more findings. After updating to 5.10.2 + linux-firmware-git 20201218.646f159-1 + and rebuilding upp and powerupp.

I can now change the power limit and it seems to really change too. Still weird behaviour where I have to slowly bring it up step by step, but I am now seeing power usage > 250 W every now and then.

The main limiting factor GFX clock frequency (2475) still remains.

Perfect! I've noticed some similar things recently with my 5700 XT as well, the other day setting the power limit had no effect and today it works all of a sudden (not quite sure if/what I updated...) but sometimes crashes after setting it (never had that happen before). If the Gfx clock is a driver/firmware issue it will hopefully be fixed soon as well.

@asiantuntija have you tried to change the memory DPM 3 clock and does it cause a crash for you as well?

quasd · 2020-12-21T20:38:27Z

Remarks regarding amdgpu.ppfeaturemask=0xffffffff

without really stuck at 203 W
with can now increase with no limits? ( temp/mhz limited )

And DPM3 settings still do the following behaviour even with all the updates

First time changing DPM3 memory freq, no errors but the settings don't really change
Second round setting a DPM3 memory freq, I get the errors mentioned somewhere in this thread and things start freezing/crashing

asiantuntija · 2020-12-21T20:57:29Z

@asiantuntija have you tried to change the memory DPM 3 clock and does it cause a crash for you as well?

Yes, I tried it a couple of times and having same symptoms as @quasd. I remember having same sort of crashes with Navi10 when it was fresh, if I had radeon-profile-daemon running on the background it would freeze pc shortly after booting, IIRC it had something to do with race condition in requesting information from the GPU. I will try a bit more with memory tomorrow, rebooted million times today already because one of my RAM sticks died on me.

Seems like fixed undervolting is the way to go, getting stable 2575MHz (max possible) with -150mV, Unigine Superposition score went from stock 9119 to 9619, and lower temperatures as well.

quasd · 2020-12-31T20:02:38Z

I got my hands on 6800 XT and it also seems to act in the same way as the 6800. Same problems seems to occur also on 6800 XT.

The max core frequency is now 2577 mhz compared to 2475 mhz.

Does anyone know what is currently the limiting/bugging part that stops us from setting higher frequencies? AFAIK these cards should be limited to 2.7ghz not 2.58ghz.

quasd · 2021-01-02T14:59:15Z

Did bit of poking around and found this
Linux 5.12 To Support Radeon RX 6000 Series OverDrive Overclocking

Now on 5.11 rc1 + few patches form drm-next and I have working memory and core frequency control through pp_od_clk_voltage ( up to 2.8ghz/1075mhz) + voltage offset.

cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 500Mhz
1: 2600Mhz
OD_MCLK:
0: 97Mhz
1: 1075MHz
OD_VDDGFX_OFFSET:
50mV
OD_RANGE:
SCLK:     500Mhz       2800Mhz
MCLK:     674Mhz       1075Mhz

Powerupp/upp still don't work with memory clock or increasing the core frequency.

moaalseiari · 2021-01-24T17:41:28Z

Did bit of poking around and found this
Linux 5.12 To Support Radeon RX 6000 Series OverDrive Overclocking

Now on 5.11 rc1 + few patches form drm-next and I have working memory and core frequency control through pp_od_clk_voltage ( up to 2.8ghz/1075mhz) + voltage offset.
cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 500Mhz
1: 2600Mhz
OD_MCLK:
0: 97Mhz
1: 1075MHz
OD_VDDGFX_OFFSET:
50mV
OD_RANGE:
SCLK:     500Mhz       2800Mhz
MCLK:     674Mhz       1075Mhz
Powerupp/upp still don't work with memory clock or increasing the core frequency.

Hello brother, can you please share the patches please.
I would like to use them myself.
I tried the patches from the link below and one of them failed to compile.

https://cgit.freedesktop.org/~agd5f/linux/diff/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c?h=drm-next&id=37a58f691551dfdff4f1035ee119c9ebdb9eb119

Best regards.

quasd · 2021-01-24T18:25:50Z

Hello brother, can you please share the patches please.
I would like to use them myself.
I tried the patches from the link below and one of them failed to compile.

https://cgit.freedesktop.org/~agd5f/linux/diff/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c?h=drm-next&id=37a58f691551dfdff4f1035ee119c9ebdb9eb119

Best regards.

5.11 rc kernel and these 4

https://cgit.freedesktop.org/~agd5f/linux/commit/?id=78d907e2b8ba89c936b7f0c3344261c653668a62
https://cgit.freedesktop.org/~agd5f/linux/commit/?id=aa75fa34e04c842d93a45087adac66ab3a2a7f33
https://cgit.freedesktop.org/~agd5f/linux/commit/?id=37a58f691551dfdff4f1035ee119c9ebdb9eb119
https://cgit.freedesktop.org/~agd5f/linux/commit/?id=a2b6df4fd6e3c0ba088b00fc00579dac263b0a64

~~however I have now stopped using these due to random black screens after the match in Overwatch~~. ( ~~dunno if relevant, haven't reproduces yet after removing the patches~~)
Seems like this was problem with wine, please ignore

moaalseiari · 2021-01-27T16:19:14Z

Hello brother, can you please share the patches please.
I would like to use them myself.
I tried the patches from the link below and one of them failed to compile.
https://cgit.freedesktop.org/~agd5f/linux/diff/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c?h=drm-next&id=37a58f691551dfdff4f1035ee119c9ebdb9eb119
Best regards.

5.11 rc kernel and these 4

https://cgit.freedesktop.org/~agd5f/linux/commit/?id=78d907e2b8ba89c936b7f0c3344261c653668a62
https://cgit.freedesktop.org/~agd5f/linux/commit/?id=aa75fa34e04c842d93a45087adac66ab3a2a7f33
https://cgit.freedesktop.org/~agd5f/linux/commit/?id=37a58f691551dfdff4f1035ee119c9ebdb9eb119
https://cgit.freedesktop.org/~agd5f/linux/commit/?id=a2b6df4fd6e3c0ba088b00fc00579dac263b0a64

~~however I have now stopped using these due to random black screens after the match in Overwatch~~. ( ~~dunno if relevant, haven't reproduces yet after removing the patches~~)
Seems like this was problem with wine, please ignore

Thanks for your reply.
Can you please explain how to manually modify the voltage and settings like you did ?
I will build the kernel now with patches.

quasd · 2021-01-27T16:42:11Z

# Power Limit
echo "400000000" > "/sys/class/drm/card0/device/hwmon/hwmon1/power1_cap" 
# Offset voltage
echo "vo -10" > /sys/class/drm/card0/device/pp_od_clk_voltage
# Core freq
echo "s 1 2590" > /sys/class/drm/card0/device/pp_od_clk_voltage
# Memory freq
echo "m 1 1075" > /sys/class/drm/card0/device/pp_od_clk_voltage
# Make settings active
echo "c" > /sys/class/drm/card0/device/pp_od_clk_voltage

moaalseiari · 2021-01-27T17:01:52Z

# Power Limit
echo "400000000" > "/sys/class/drm/card0/device/hwmon/hwmon1/power1_cap" 
# Offset voltage
echo "vo -10" > /sys/class/drm/card0/device/pp_od_clk_voltage
# Core freq
echo "s 1 2590" > /sys/class/drm/card0/device/pp_od_clk_voltage
# Memory freq
echo "m 1 1075" > /sys/class/drm/card0/device/pp_od_clk_voltage
# Make settings active
echo "c" > /sys/class/drm/card0/device/pp_od_clk_voltage

thank you very much, it worked prefectly.

moaalseiari · 2021-01-27T19:13:15Z

# Power Limit
echo "400000000" > "/sys/class/drm/card0/device/hwmon/hwmon1/power1_cap" 
# Offset voltage
echo "vo -10" > /sys/class/drm/card0/device/pp_od_clk_voltage
# Core freq
echo "s 1 2590" > /sys/class/drm/card0/device/pp_od_clk_voltage
# Memory freq
echo "m 1 1075" > /sys/class/drm/card0/device/pp_od_clk_voltage
# Make settings active
echo "c" > /sys/class/drm/card0/device/pp_od_clk_voltage

sorry to bother you here but

echo "vo -10" > /sys/class/drm/card0/device/pp_od_clk_voltage

is like a decrease of -10mv ? when i run sudo watch sensors, the voltage shows 1150mv for the gpu.
is this the only way to control core freq voltage ?
I dropped the offset to -200mv with echo "vo -200" > /sys/class/drm/card0/device/pp_od_clk_voltage
and the card crashed which means it uses the offset voltage.

Could you plz explain, and thanks so much for sharing your knowledge.

quasd · 2021-01-27T19:57:18Z

echo "vo -10" > /sys/class/drm/card0/device/pp_od_clk_voltage

is like a decrease of -10mv ? when i run sudo watch sensors, the voltage shows 1150mv for the gpu.

yes

is this the only way to control core freq voltage ?

yes afaik

koiakoia · 2021-05-20T16:52:59Z

# Power Limit
echo "400000000" > "/sys/class/drm/card0/device/hwmon/hwmon1/power1_cap" 
# Offset voltage
echo "vo -10" > /sys/class/drm/card0/device/pp_od_clk_voltage
# Core freq
echo "s 1 2590" > /sys/class/drm/card0/device/pp_od_clk_voltage
# Memory freq
echo "m 1 1075" > /sys/class/drm/card0/device/pp_od_clk_voltage
# Make settings active
echo "c" > /sys/class/drm/card0/device/pp_od_clk_voltage

This allowed me to at least get above the intial 1000 hurdle. However once I have executed these commands once I am not able to execute again. It says invalid argument. So I now am running at 1075 which is better then stock at least.

I am not sure on the programming side of things what the challenges are but wanted to confirm that for me at least this allowed the overclock. Although Powerupp still does not make any changes to these settings for me.

quasd changed the title ~~6800 crashing after applying anything~~ 6800 crashing after applying DPM3 settings Dec 20, 2020

6800 crashing after applying DPM3 settings #21

6800 crashing after applying DPM3 settings #21

Comments

quasd commented Dec 20, 2020

quasd commented Dec 20, 2020

azeam commented Dec 20, 2020 • edited Loading

quasd commented Dec 20, 2020

azeam commented Dec 20, 2020

quasd commented Dec 20, 2020

quasd commented Dec 20, 2020 • edited Loading

quasd commented Dec 20, 2020

quasd commented Dec 20, 2020

azeam commented Dec 20, 2020

quasd commented Dec 20, 2020 • edited Loading

azeam commented Dec 20, 2020

azeam commented Dec 20, 2020

quasd commented Dec 20, 2020 • edited Loading

quasd commented Dec 20, 2020

quasd commented Dec 20, 2020 • edited Loading

azeam commented Dec 20, 2020

quasd commented Dec 20, 2020 • edited Loading

quasd commented Dec 20, 2020

sibradzic commented Dec 21, 2020 • edited Loading

azeam commented Dec 21, 2020

azeam commented Dec 21, 2020

asiantuntija commented Dec 21, 2020

azeam commented Dec 21, 2020 • edited Loading

azeam commented Dec 21, 2020

asiantuntija commented Dec 21, 2020 • edited Loading

azeam commented Dec 21, 2020

quasd commented Dec 21, 2020

azeam commented Dec 21, 2020

asiantuntija commented Dec 21, 2020

quasd commented Dec 21, 2020

quasd commented Dec 21, 2020

azeam commented Dec 21, 2020

quasd commented Dec 21, 2020 • edited Loading

asiantuntija commented Dec 21, 2020

quasd commented Dec 31, 2020

quasd commented Jan 2, 2021 • edited Loading

moaalseiari commented Jan 24, 2021

quasd commented Jan 24, 2021 • edited Loading

moaalseiari commented Jan 27, 2021

quasd commented Jan 27, 2021

moaalseiari commented Jan 27, 2021

moaalseiari commented Jan 27, 2021

quasd commented Jan 27, 2021

koiakoia commented May 20, 2021

azeam commented Dec 20, 2020 •

edited

Loading

quasd commented Dec 20, 2020 •

edited

Loading

quasd commented Dec 20, 2020 •

edited

Loading

quasd commented Dec 20, 2020 •

edited

Loading

quasd commented Dec 20, 2020 •

edited

Loading

quasd commented Dec 20, 2020 •

edited

Loading

sibradzic commented Dec 21, 2020 •

edited

Loading

azeam commented Dec 21, 2020 •

edited

Loading

asiantuntija commented Dec 21, 2020 •

edited

Loading

quasd commented Dec 21, 2020 •

edited

Loading

quasd commented Jan 2, 2021 •

edited

Loading

quasd commented Jan 24, 2021 •

edited

Loading