Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should this crate intercept Interrupt errors? #162

Closed
notgull opened this issue Oct 18, 2023 · 5 comments · Fixed by #164
Closed

Should this crate intercept Interrupt errors? #162

notgull opened this issue Oct 18, 2023 · 5 comments · Fixed by #164

Comments

@notgull
Copy link
Member

notgull commented Oct 18, 2023

The polling system call can be interrupted in some cases. However, this error is bubbled up to the user. async-io handles it explicitly, but other users may not expect to need to handle this error. Should we be handling it on the polling side instead?

cc psychon/x11rb#892

@psychon
Copy link
Contributor

psychon commented Oct 18, 2023

Heh, I'm too slow.

Here is the reproducer I just prepared:

fn main() {
    let poller = polling::Poller::new().unwrap();
    let mut events = polling::Events::new();
    dbg!(poller.wait(&mut events, None));
}

Running this with cargo run and then running strace -p $(pidof foo) in another terminal results in:

[src/main.rs:4] poller.wait(&mut events, None) = Err(
    Os {
        code: 4,
        kind: Interrupted,
        message: "Interrupted system call",
    },
)

(This is on Debian Linux)

According to the docs for wait... well, they don't say much explicitly, but the following sounds to me like interruptions should be caught and handled internally:

This method will return with no new events if a notification is delivered by the notify() method, or the timeout is reached. Sometimes it may even return with no events spuriously.

@psychon
Copy link
Contributor

psychon commented Oct 18, 2023

I tried to produce a test case for this. It is not nice since it registers a signal handler that stays valid forever. And it uses libc directly to get access to pthread_kill. But at least it seems to reproduce the problem.

Dunno how this can be done in a nicer way. Just killing the whole process leads to the signal being handled in the wrong thread, so no interrupted syscall where we want it.

diff --git a/Cargo.toml b/Cargo.toml
index 3c3e8db..b939bd8 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -44,3 +44,5 @@ features = [
 [dev-dependencies]
 easy-parallel = "3.1.0"
 fastrand = "2.0.0"
+signal-hook = "0.3.17"
+libc = "*"  # TODO: Proper version
diff --git a/tests/concurrent_modification.rs b/tests/concurrent_modification.rs
index 0797b0f..81409e4 100644
--- a/tests/concurrent_modification.rs
+++ b/tests/concurrent_modification.rs
@@ -1,3 +1,4 @@
+use std::sync::mpsc::channel;
 use std::io::{self, Write};
 use std::net::{TcpListener, TcpStream};
 use std::thread;
@@ -76,6 +77,45 @@ fn concurrent_modify() -> io::Result<()> {
     Ok(())
 }
 
+#[test]
+fn concurrent_interruption() -> io::Result<()> {
+    let (reader, _writer) = tcp_pair()?;
+    let poller = Poller::new()?;
+    unsafe {
+        poller.add(&reader, Event::none(0))?;
+    }
+
+    let mut events = Events::new();
+    let events_borrow = &mut events;
+    let (sender, receiver) = channel();
+
+    Parallel::new()
+        .add(move || {
+            // Register a signal handler so that the syscall is actually interrupted. A signal that
+            // is ignored by default does not cause an interrupted syscall.
+            signal_hook::flag::register(signal_hook::consts::signal::SIGURG, Default::default())?;
+
+            // Signal to the other thread how to send a signal to us
+            sender.send(unsafe { libc::pthread_self() }).unwrap();
+
+            poller.wait(events_borrow, Some(Duration::from_secs(1)))?;
+            Ok(())
+        })
+        .add(move || {
+            let target_thread = receiver.recv().unwrap();
+            thread::sleep(Duration::from_millis(100));
+            assert_eq!(0, unsafe { libc::pthread_kill(target_thread, libc::SIGURG) });
+            Ok(())
+        })
+        .run()
+        .into_iter()
+        .collect::<io::Result<()>>()?;
+
+    assert_eq!(events.len(), 0);
+
+    Ok(())
+}
+
 fn tcp_pair() -> io::Result<(TcpStream, TcpStream)> {
     let listener = TcpListener::bind("127.0.0.1:0")?;
     let a = TcpStream::connect(listener.local_addr()?)?;

Output:

$ cargo test --test concurrent_modification
    Finished test [unoptimized + debuginfo] target(s) in 0.01s
     Running tests/concurrent_modification.rs (target/debug/deps/concurrent_modification-51efc33044e17d04)

running 3 tests
test concurrent_modify ... ok
test concurrent_add ... ok
test concurrent_interruption ... FAILED

failures:

---- concurrent_interruption stdout ----
Error: Os { code: 4, kind: Interrupted, message: "Interrupted system call" }


failures:
    concurrent_interruption

test result: FAILED. 2 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.10s

error: test failed, to rerun pass `--test concurrent_modification`

@ids1024
Copy link

ids1024 commented Oct 18, 2023

I guess where you'd want polling to not handle EINTR is if you have an interrupt handler that sets a flag for the main loop (generally it's not safe to do much more than that in an interrupt handler), and then have EINTR stop a poll call (with no timeout, or a longer one than you'd want the signal handling to wait on)?

In https://github.com/ids1024/reis/blob/master/examples/reis-demo-server.rs I used a 100ms timeout just so the main loop could handle a signal using the signal-hook crate.

Though on Linux/BSD specifically, signalfd and kqueue may provide a better solution. Though those have their own subtle complications. Or maybe a pipe would be the traditional solution.

@notgull
Copy link
Member Author

notgull commented Oct 19, 2023

I guess where you'd want polling to not handle EINTR is if you have an interrupt handler that sets a flag for the main loop

Truth be told I don't find this use case too compelling. Especially since the "correct" way of handling signals asynchronously on Unix is to have a pipe that wakes up poll/epoll once your signal occurs. The way you describe there is a glorified spinloop, and even more modern signal handling mechanisms like signalfd has some surprising behavior that isn't desired in many cases.

So I think that being able to handle EINTR internally has benefits that outweigh the potential costs.

@psychon Did you already have a PR on the way? I see that you forked this repo and it seems like you already have a test set up. If not I can fix this.

@psychon
Copy link
Contributor

psychon commented Oct 20, 2023

No PR from me, nope. I only wanted to do #163 . That's why I created a fork.


What exactly do you mean with "handle internally"? I would propose to do something like:

if err.kind() == ErrorKind::Interrupted {
   events.clear();
   return Ok(());
}

So, do not re-try the poll, but just return to the caller. That should satisfy all use cases, shouldn't it?

That's also why I quoted the docs above, since they seem to allow this behaviour:

Sometimes it may even return with no events spuriously.

(It didn't occur to me until now that "retry" is also a way to handle this internally... but that might become rather complicated with the timeout parameter... but perhaps you already turn the timeout into a deadline anyway and have code to re-computed the timeout? Dunno.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants