Should this crate intercept Interrupt errors? #162

notgull · 2023-10-18T15:49:56Z

The polling system call can be interrupted in some cases. However, this error is bubbled up to the user. async-io handles it explicitly, but other users may not expect to need to handle this error. Should we be handling it on the polling side instead?

cc psychon/x11rb#892

The text was updated successfully, but these errors were encountered:

psychon · 2023-10-18T15:54:36Z

Heh, I'm too slow.

Here is the reproducer I just prepared:

fn main() {
    let poller = polling::Poller::new().unwrap();
    let mut events = polling::Events::new();
    dbg!(poller.wait(&mut events, None));
}

Running this with cargo run and then running strace -p $(pidof foo) in another terminal results in:

[src/main.rs:4] poller.wait(&mut events, None) = Err(
    Os {
        code: 4,
        kind: Interrupted,
        message: "Interrupted system call",
    },
)

(This is on Debian Linux)

According to the docs for wait... well, they don't say much explicitly, but the following sounds to me like interruptions should be caught and handled internally:

This method will return with no new events if a notification is delivered by the notify() method, or the timeout is reached. Sometimes it may even return with no events spuriously.

psychon · 2023-10-18T16:56:11Z

I tried to produce a test case for this. It is not nice since it registers a signal handler that stays valid forever. And it uses libc directly to get access to pthread_kill. But at least it seems to reproduce the problem.

Dunno how this can be done in a nicer way. Just killing the whole process leads to the signal being handled in the wrong thread, so no interrupted syscall where we want it.

diff --git a/Cargo.toml b/Cargo.toml
index 3c3e8db..b939bd8 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -44,3 +44,5 @@ features = [
 [dev-dependencies]
 easy-parallel = "3.1.0"
 fastrand = "2.0.0"
+signal-hook = "0.3.17"
+libc = "*"  # TODO: Proper version
diff --git a/tests/concurrent_modification.rs b/tests/concurrent_modification.rs
index 0797b0f..81409e4 100644
--- a/tests/concurrent_modification.rs
+++ b/tests/concurrent_modification.rs
@@ -1,3 +1,4 @@
+use std::sync::mpsc::channel;
 use std::io::{self, Write};
 use std::net::{TcpListener, TcpStream};
 use std::thread;
@@ -76,6 +77,45 @@ fn concurrent_modify() -> io::Result<()> {
     Ok(())
 }
 
+#[test]
+fn concurrent_interruption() -> io::Result<()> {
+    let (reader, _writer) = tcp_pair()?;
+    let poller = Poller::new()?;
+    unsafe {
+        poller.add(&reader, Event::none(0))?;
+    }
+
+    let mut events = Events::new();
+    let events_borrow = &mut events;
+    let (sender, receiver) = channel();
+
+    Parallel::new()
+        .add(move || {
+            // Register a signal handler so that the syscall is actually interrupted. A signal that
+            // is ignored by default does not cause an interrupted syscall.
+            signal_hook::flag::register(signal_hook::consts::signal::SIGURG, Default::default())?;
+
+            // Signal to the other thread how to send a signal to us
+            sender.send(unsafe { libc::pthread_self() }).unwrap();
+
+            poller.wait(events_borrow, Some(Duration::from_secs(1)))?;
+            Ok(())
+        })
+        .add(move || {
+            let target_thread = receiver.recv().unwrap();
+            thread::sleep(Duration::from_millis(100));
+            assert_eq!(0, unsafe { libc::pthread_kill(target_thread, libc::SIGURG) });
+            Ok(())
+        })
+        .run()
+        .into_iter()
+        .collect::<io::Result<()>>()?;
+
+    assert_eq!(events.len(), 0);
+
+    Ok(())
+}
+
 fn tcp_pair() -> io::Result<(TcpStream, TcpStream)> {
     let listener = TcpListener::bind("127.0.0.1:0")?;
     let a = TcpStream::connect(listener.local_addr()?)?;

Output:

$ cargo test --test concurrent_modification
    Finished test [unoptimized + debuginfo] target(s) in 0.01s
     Running tests/concurrent_modification.rs (target/debug/deps/concurrent_modification-51efc33044e17d04)

running 3 tests
test concurrent_modify ... ok
test concurrent_add ... ok
test concurrent_interruption ... FAILED

failures:

---- concurrent_interruption stdout ----
Error: Os { code: 4, kind: Interrupted, message: "Interrupted system call" }


failures:
    concurrent_interruption

test result: FAILED. 2 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.10s

error: test failed, to rerun pass `--test concurrent_modification`

ids1024 · 2023-10-18T19:25:50Z

I guess where you'd want polling to not handle EINTR is if you have an interrupt handler that sets a flag for the main loop (generally it's not safe to do much more than that in an interrupt handler), and then have EINTR stop a poll call (with no timeout, or a longer one than you'd want the signal handling to wait on)?

In https://github.com/ids1024/reis/blob/master/examples/reis-demo-server.rs I used a 100ms timeout just so the main loop could handle a signal using the signal-hook crate.

Though on Linux/BSD specifically, signalfd and kqueue may provide a better solution. Though those have their own subtle complications. Or maybe a pipe would be the traditional solution.

notgull · 2023-10-19T00:26:56Z

I guess where you'd want polling to not handle EINTR is if you have an interrupt handler that sets a flag for the main loop

Truth be told I don't find this use case too compelling. Especially since the "correct" way of handling signals asynchronously on Unix is to have a pipe that wakes up poll/epoll once your signal occurs. The way you describe there is a glorified spinloop, and even more modern signal handling mechanisms like signalfd has some surprising behavior that isn't desired in many cases.

So I think that being able to handle EINTR internally has benefits that outweigh the potential costs.

@psychon Did you already have a PR on the way? I see that you forked this repo and it seems like you already have a test set up. If not I can fix this.

psychon · 2023-10-20T15:57:30Z

No PR from me, nope. I only wanted to do #163 . That's why I created a fork.

What exactly do you mean with "handle internally"? I would propose to do something like:

if err.kind() == ErrorKind::Interrupted {
   events.clear();
   return Ok(());
}

So, do not re-try the poll, but just return to the caller. That should satisfy all use cases, shouldn't it?

That's also why I quoted the docs above, since they seem to allow this behaviour:

Sometimes it may even return with no events spuriously.

(It didn't occur to me until now that "retry" is also a way to handle this internally... but that might become rather complicated with the timeout parameter... but perhaps you already turn the timeout into a deadline anyway and have code to re-computed the timeout? Dunno.)

notgull mentioned this issue Oct 21, 2023

Handle interrupts while polling #164

Merged

notgull closed this as completed in #164 Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should this crate intercept Interrupt errors? #162

Should this crate intercept Interrupt errors? #162

notgull commented Oct 18, 2023

psychon commented Oct 18, 2023

psychon commented Oct 18, 2023

ids1024 commented Oct 18, 2023

notgull commented Oct 19, 2023

psychon commented Oct 20, 2023

Should this crate intercept Interrupt errors? #162

Should this crate intercept Interrupt errors? #162

Comments

notgull commented Oct 18, 2023

psychon commented Oct 18, 2023

psychon commented Oct 18, 2023

ids1024 commented Oct 18, 2023

notgull commented Oct 19, 2023

psychon commented Oct 20, 2023