Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Graceful System-Suspend Support #35

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

arthurt
Copy link

@arthurt arthurt commented Oct 18, 2023

This PR address the issue of canary false-positives cased by system-suspend by adding new public methods of "Suspend()" and "Resume()", as well as optionally connecting them to logind system-sleep handling.

The Bug

During a system suspend-resume (sleep) cycle, the canary thread often experiences a time jump which causes a starvation false-positive. rtkit takes action and demotes the realtime/high priority of all known threads.

Long running realtime processes (Pipewire, Pulseaudio) generally only request realtime/high priority once. If a system goes to sleep, the realtime/high priority scheduling is lost until these long-running processes are next started, after logout and login. As users generally suspend their machines more often than logging in, rtkit is basically non-functional for these processes, arguably the most important processes to use rtkit.

Even non-long-running processes may have lifecycles which span system suspend-resume cycles, and so operate in a degraded way for users.

See

Why

With the view that the primary bug this change seeks to address is the canary false positives, it would seem to be far simpler to only start and stop use of the the canary during suspend. However, doing so would degrade security for a controllable window. From a security perspective, one might as well just disable the canary altogether. To safely disable the canary, we need to first demote all threads.

Suspend/Resume Operation

Two new admin operations are added to rtkit.

  • Suspend: org.freedesktop.RealtimeKit1.Suspend(), rtkitctl --suspend
  • Resume. org.freedesktop.RealtimeKit1.Resume(), rtkitctl --resume

These temporarily demote and restore managed thread priorities, as well as stop and start the canary.

On Suspend(), all managed threads are demoted, and the canary stopped.

While suspended, new realtime/high priority requests are rejected. Managed thread states are still garbage if a thread exists, but are retained otherwise.

On Resume() the canary is restarted, and all managed threads are re-promoted. Current user burst limit timeouts are restarted, and the re-promotion of threads counts toward burst limiting, but the burst limit is not enforced on the re-promotion.

Calling ResetKnown() or ResetAll() while suspended removes all managed threads which lack realtime/high-priority, leaving no threads to re-promote later.

Calling either Suspend() and Resume() multiple times in a row is fine, but only the first call has an effect.

Security Considerations

Suspend() and Resume() are only available to admin callers, preventing abuse. Notwithstanding, if a malicious user was able to call suspend and resume at will, they still could not circumvent the count or burst limits. No new threads promotions can be created when suspended. Further, while the user burst limit is not enforced on resume, it is still updated, and the burst timeout restarted.

It may be safe to allow for new realtime/high priority grants while in suspended mode to take effect upon resume, but this is an unlikely case, so it's easier to just refuse.

logind Integration

This change also adds an optional runtime integration with logind's inhibitor locks for handling system-suspend.

If the logind dbus service is running and accessible, rtkit will register a "delay sleep inhibitor", and listen for signals from logind about when the system is going to sleep or having just woken up. Using the sleep inhibitor, logind will wait for rtkit to perform it's Suspend() operation before letting the system suspend. On system resume, logind will again notify rtkit, which will perform Resume() and register a new inhibitor.

See https://www.freedesktop.org/wiki/Software/systemd/inhibit/

Alternate Integrations

No alternate automatic system-suspend integration is provided, but rtkitctl --suspend and rtkitctl --resume should make this task easy.

Other Changes

  • Rename priority (dynamic) to nice_level inside of process_set_high_priority(). Helps differentiate it from priority (static) as used by process_set_realtime(). Also, it's called nice_level everywhere else in the code.

  • Reduce log spam by not printing a message for every handled dbus message, as that includes dbus introspection and properties related messages. Some programs (Firefox in my case) get rtkit properties more frequently than I would think necessary.

To avoid confusion, differentiate between the scheduling priority used by
non-realtime scheduling to calculate dynamic pritoriy (aka nice value),
and the static scheduling priority used for realtime schedulers.

Renaming argument names in a DBus interface is fully compatible.

Also makes rtkit.c and rtkit-daemon.c agree.
Implement the suspend and resume functionality to temporarily demote and
restore managed thread priorities. Priorities of managed threads are
remembered when granted.

On suspend, all managed threads are demoted, and the canary stopped.

While suspended new promotion requests are rejected. Managed threads are
still garbage collected, but the lack of a promoted priority is ignored.
Reset removes all managed threads, leaving no threads to re-promote on
resume.

On resume the canary is restarted, and all managed threads are
re-promoted, heeding but not enforcing the user burst limit.

Suspend and Resume are only available to admin callers, preventing
abuse. Notwithstanding, if a malicious users was able to call suspend
and resume at will, they still could not circumvent the count or burst
limiting. No new threads promotions can be created during when
suspended. Further, while the user burst limit is not enforced on
resume, it is still updated, and the burst time window is restarted on
resume.
Handling org.freedesktop.DBus.Properties and
org.freedesktop.DBus.Introspection messages causes a debug log message
about the number of monitored threads, despite these interfaces not
being able to add threads.

As these interfaces may be called frequently by other bus uses, skip
printing the debug message in these cases.
Add support for logind system suspend delay and signalling using DBus.

See https://www.freedesktop.org/wiki/Software/systemd/inhibit/

This adds a race-free way to suspend priorities when the system is
going to sleep, and resume them when the system wakes up. The main
reason for suspending priorities is the canary can be stopped, as a
system suspend-resume cycle frequently causes a false-positive of
the canary.
@aviallon
Copy link

aviallon commented Nov 9, 2024

I wonder if rtkit should just be forked into an organization somewhere on Freedesktop.org's GitLab, just so we can keep on using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants