Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make Poller fork-safe #37

Open
acully-vmware opened this issue Feb 2, 2022 · 9 comments
Open

make Poller fork-safe #37

acully-vmware opened this issue Feb 2, 2022 · 9 comments

Comments

@acully-vmware
Copy link

We have an application in which we tried to use a polling::Poller in a parent process to accept a connection, then fork and handle the connection in the child process. (We aren't using exec, right now.) I discovered that this doesn't work well: after the fork, when we tear down the Poller structure, its Drop implementation removes the timer_fd and event_fd from the epoll_fd. But because the file-descriptors (such as the epoll file-descriptor that Poller uses internally) are shared between the parent process and the child process, this means that the epoll_fd (which remains open in the parent process) gets into an invalid state. I'd like to close the files in the child, they aren't needed there, and I don't want to leak, but I'd like to avoid modifying the epoll_fd while doing so.

With this change:

--- src/epoll.rs
+++ src/epoll.rs
@@ -235,10 +235,8 @@ impl Drop for Poller {
         );
         if let Some(timer_fd) = self.timer_fd {
-            let _ = self.delete(timer_fd);
             let _ = syscall!(close(timer_fd));
         }
-        let _ = self.delete(self.event_fd);
         let _ = syscall!(close(self.event_fd));
         let _ = syscall!(close(self.epoll_fd));
     }

I believe the implementation would be fork-safe, and slightly more performant.

@acully-vmware
Copy link
Author

If this is agreeable, I can make an MR with this change, and add a test that Poller becomes fork-safe (on platforms that use fork).

@dignifiedquire
Copy link

This would need to be something that the user can choose, as the default needs to stay as it is right now.

@acully-vmware
Copy link
Author

acully-vmware commented Feb 11, 2022

This would need to be something that the user can choose, as the default needs to stay as it is right now.

I don't believe the suggestion I'm making will change user-visible behavior, except when using fork, and the current behavior under fork breaks the parent process. Please correct me if I'm wrong, but:

  • The file-descriptors in question are private to the Poller instance;
  • The Poller structure does not implement Clone (that I can tell) -- there is no possibility of multiple Pollers aliasing the same kernel-side structures, except via fork (or, I suppose, clone without CLONE_FILES);
  • The Poller itself is no longer accessible to Rust code as the Drop function is being invoked.

The change I suggest is to stop performing unneeded clean-up on file-descriptors that are about to be destroyed: Since they are about to be destroyed, the clean-up is unnecessary in the common case. In case fork is called, though, clean-up in the child process affects the parent-process's epoll file-descriptor in a way that breaks behavior.

I know that fork is a problematic API for epoll. In principle, we should be able to handle the problems by closing the epoll file-descriptor (and the timer_fd and event_fd that are created along with it in the Poller structure) in the child, preventing interaction with the parent. As the code stands today, though, I'm forced to either modify our local version of polling, or forget the Poller instance (leaking file-descriptors in the process), or not use polling at all, and create our own implementation of the function.

@fogti
Copy link
Member

fogti commented Dec 30, 2022

this sounds like the epoll fd should be created using EPOLL_CLOEXEC...

@notgull
Copy link
Member

notgull commented Dec 30, 2022

this sounds like the epoll fd should be created using EPOLL_CLOEXEC...

epoll is already created with EPOLL_CLOEXEC.

@fogti
Copy link
Member

fogti commented Jan 1, 2023

oh. ah right, I think extracting the FD and forgetting the Poller instance would be the most sensible option. (which could be "packages" into a method on Poller to handle any edge cases and such, #[cfg(unix)] probably (I think *BSD should be handled similarly))

@notgull
Copy link
Member

notgull commented Jan 9, 2023

Interestingly, Boost ASIO has a notify_fork function, which allows the reactor to recreate itself if a fork has happened. I'm not sure if we'd like to have this function (nor am I implying that we should be taking any positive lessons from Boost), but I figure it's worth noting.

@fogti
Copy link
Member

fogti commented Jan 10, 2023

that sounds a bit appealing (such a function would be marked unsafe, unless we can make sure that it couldn't disrupt anything if it would be called without a previous fork, and could deal with the interactions from other threads)

@notgull
Copy link
Member

notgull commented Aug 15, 2023

Now that I think about it, a function like this would be far more relevant to async-io, because it already keeps track of sources there and forks hurt the reactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants