Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seems like 5.10 is Breaking Powerupp's Ability to Raise Power Limits #18

Closed
gardotd426 opened this issue Sep 6, 2020 · 9 comments
Closed

Comments

@gardotd426
Copy link

So, in trying to figure out the issues I was having (I'm still working on that by the way, more on the situation later), I filed a bug report about the flickering with multiple monitors connected (others also are having the issue too, apparently) even when OD is not enabled. Alex from AMD suggested I try the amd-staging-drm-next branch, which has a bunch of code they're queuing up for 5.10, which includes a complete overhaul of the powerplay situation. For example, the directory is no longer drivers/gpu/drm/amd/powerplay, that directory will no longer exist, and is being replaced with drivers/gpu/drm/amd/pm. But I checked, and it looked like most of the same files were in /sys/class/drm/card0/device and .../device/hwmon/hwmon*/.

However, raising the power limit no longer has any effect (with ppfeaturemask not enabled, at least).

I raised the power limit using powerupp on my 5700 XT from 190 to 220W. Usually, once I do that, power1_cap will show 220, and also radeon-profile will show "current power limit" as 220. That no longer happens. dmesg reports:

[33787.762241] amdgpu 0000:11:00.0: amdgpu: use vbios provided pptable
[33787.762244] amdgpu 0000:11:00.0: amdgpu: smc_dpm_info table revision(format.content): 4.5
[33787.762929] amdgpu 0000:11:00.0: amdgpu: SMU is initialized successfully!
[33787.782368] amdgpu 0000:11:00.0: amdgpu: New power limit (220) is over the max allowed 190
[33787.968963] amdgpu: manual fan speed control should be enabled first
[33810.130165] amdgpu 0000:11:00.0: amdgpu: use vbios provided pptable
[33810.130168] amdgpu 0000:11:00.0: amdgpu: smc_dpm_info table revision(format.content): 4.5
[33810.131740] amdgpu 0000:11:00.0: amdgpu: SMU is initialized successfully!
[33810.147654] amdgpu 0000:11:00.0: amdgpu: New power limit (220) is over the max allowed 190
[33810.970842] amdgpu: manual fan speed control should be enabled first

(this is from two attempts).

I'm unsure of where the "manual fan speed control should be enabled first" is coming from, I've never seen that before and have no idea how to even enable manual fan speed control (I know how to control the fans obviously, but I don't know how to "enable manual fan speed control" in the driver).

Anyway, those error messages show up in dmesg, and radeon-profile and power1_cap as well as power1_cap_max still show 190 as the max, and when I run any stress tests or benchmarks, the power usage doesn't go to 220, even though it used to go up to whatever I would set it at. It'll sometimes just for an instant jump over 190, but that happens with a 190 power limit too, as I'm sure you're aware. But like, it doesn't actually stay above 190, and under load it stays right in the 180-190 range which is obviously showing that the power limit is still 190 (which everything else is still reporting, too).

It looks like you might have to do some tweaking once the new code lands, just wanted to give you a heads up.

As far as the other issue I reported, it's really hard for me to do anything right now because only amd-staging-drm-next fixes the flickering and SMU dmesg errors, but as I'm reporting here, I can't really use powerupp with that kernel. Changing the memory and core frequencies with powerupp DOES seem to work, so I guess I could try a persistent save with that and see if I still get the weird negative temp reading and loss of fans with powerupp, and we can go from there.

@azeam
Copy link
Owner

azeam commented Sep 6, 2020

sudo echo "1" > /sys/class/drm/card0/device/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/pwm1_enable should enable manual fan speed control (not sure if that will help though but you can try).

Do try the persistent save as well if you can.

@gardotd426
Copy link
Author

sudo echo "1" > /sys/class/drm/card0/device/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/pwm1_enable should enable manual fan speed control (not sure if that will help though but you can try).

That's not the issue, I have it enabled already. There's some goofy stuff going on with 5.10 and them reworking all of the powerplay options as I mentioned. I'll try the persistent save and report back.

@gardotd426
Copy link
Author

Persistent save causes all the same goofy shit as was happening before from issue #16. Temps at -0.002C, no fan control, power limit raising still doesn't apply.

You really might want to take a look at the amd-staging-drm-next repo because they told me they're planning to upstream all these changes to 5.10, it seems like a pretty big reworking of the powerplay/pm bits.

@azeam
Copy link
Owner

azeam commented Sep 7, 2020

I will take a look, if AMD decides to set some hard limits there is likely little I can do about it though... Can you change the power limit using OD?

As for #16 you could try to revert to this commit, it is the only change to the persistent save I have done recently, I really can't see how that would affect things the way they do but better to be sure.

@gardotd426
Copy link
Author

I'll give both those things a shot (trying to raise power limits with OD and building powerupp without that commit) and report back.

@Smokolak
Copy link

Any new on it? I'm using kernel 5.10.1 and the issue appears still.

@azeam
Copy link
Owner

azeam commented Dec 18, 2020

Sorry, I made an attempt at testing the 5.10 kernel under Arch back in September but had some issues at the time and I wasn't able to install it.

Anyway, I've made some initial testing now and in order to get rid of the New power limit (X) is over the max allowed Y error and actually setting the power limit, it seems to now be necessary to set the amdgpu.ppfeaturemask=0xffffffff boot flag.

The power limit has always been a bit weird and the way powerupp works (what has been required to make the power limit actually increase) is that it first sets the limit in the powerplay table and after to sysfs /sys/class/hwmon/$(ls -1 /sys/class/drm/card0/device/hwmon)/power1_cap. The latter is where the 5.10 changes seem to have occurred, it is not longer possible to change the sysfs without the ppfeaturemask boot flag, but with it it is possible to raise it to whatever limit (in combination with the powerplay table (or just using powerupp)).

Despite being reflected in sysfs as well as the powerplay table it does not, however, seem to actually work as far as I can tell. This has been working on and off depending on firmware version, but I have tested the old https://archive.archlinux.org/packages/l/linux-firmware/linux-firmware-20191220.6871bff-1-any.pkg.tar.xz, which is known to work, as well as the latest and I don't seem to be able to push the card above the stock limit.

Let me know if you have other experiences and I will try to find some time for further testing.

@azeam
Copy link
Owner

azeam commented Dec 21, 2020

Not quite sure what is going on but I just re-tested this and now increasing the power limit is working with both the latest firmware as well as the old one linked above (Ubuntu mainline kernel 5.10.1-051001-generic)... amdgpu.ppfeaturemask=0xffffffff turned on. Perhaps I was temperature limited during the last testing? Can you test if it works with the OD flag turned on?

@Smokolak
Copy link

After adding amdgpu.ppfeaturemask=0xffffffff it works fine for me kernel 5.10.1 Solus. Now I can change power cap without issues and it actually works. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants