Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFI Execution Plans that spawn threads panic #13851

Open
timsaucer opened this issue Dec 19, 2024 · 3 comments
Open

FFI Execution Plans that spawn threads panic #13851

timsaucer opened this issue Dec 19, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@timsaucer
Copy link
Contributor

Describe the bug

If you have an FFI Execution Plan that spawns it's own tasks, it has no reactor and causes a panic.

To Reproduce

This is most easily demonstrated in datafusion-python. Just create a table provider that spawns a thread. I have examples on my computer but they are non-trivial (and it's probably me working on this issue).

Expected behavior

We should never cause a panic across the ffi boundary.

Additional context

No response

@timsaucer timsaucer added the bug Something isn't working label Dec 19, 2024
@timsaucer timsaucer self-assigned this Dec 19, 2024
@alamb
Copy link
Contributor

alamb commented Dec 19, 2024

Seems like there are two problems:

  1. The panic is going across FFI boundaries
  2. The FFI code doesn't have a tokio rutime setup

Maybe we need to have some way to pass in a handle to the tokio runtime 🤔 Handle across the FFI boundaries

@timsaucer
Copy link
Contributor Author

There is only one point where we have ffi calls that are async, currently. This happens in the record batch stream. So I think what needs to happen is that within the private data for the RBS we also include an Option<Arc<Runtime>> and if it exists, we enter the runtime during the async calls.

We do pass a context. I worry about a possible problem where we could have deadlocks if we have two different runtimes each doing a blocking call. The only place that is immediately jumping out to me as a potential problem spot is RecordBatchReceiverStreamBuilder, but I haven't had time to thoroughly look through this.

I think the immediate next thing to do is to build up a good reproducible test, ideally that I could include as a unit test.

@timsaucer
Copy link
Contributor Author

Initial testing of passing in an Option<Arc<Runtime>> to the table provider and feeding it down through the execution plan and record batch stream was successful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants