-
Notifications
You must be signed in to change notification settings - Fork 63
Description
What happens?
When duckdb.connect() is called in a parent process (e.g., a FastAPI/uvicorn server), and a child process is later created via multiprocessing.Process using the default fork start method, calling duckdb.connect() in the child process raises:
RuntimeError: thread::join failed: No such process
or in some cases, depending on timing:
RuntimeError: Invalid argument
This looks like it occurs because fork() copies DuckDB's global state into the child process, where that state is invalid. fork() only duplicates the calling thread— other threads from DuckDB's internal thread pool don't exist in the child, so joining them fails.
It's worth noting that this doesn't seem to affect in memory connections, looks to be specific to motherduck connections
This is related to duckdb/duckdb#13079
To Reproduce
import multiprocessing as mp
import sys
import duckdb
MD_PATH = "md:?motherduck_token=<your_token>"
def child():
try:
conn = duckdb.connect(MD_PATH)
print(f"Child: {conn.execute('SELECT 1').fetchall()}")
conn.close()
print("Child: SUCCESS")
except Exception as e:
print(f"Child: FAILED - {type(e).__name__}: {e}")
sys.exit(1)
if __name__ == "__main__":
ctx = mp.get_context("fork")
conn = duckdb.connect(MD_PATH)
conn.execute("SELECT 1")
conn.close()
p = ctx.Process(target=child)
p.start()
p.join(timeout=30)
print(f"Exit code: {p.exitcode}")Note: This reproduces on Linux where fork is the default mp start method. On macOS, the default is spawn, which is not affected.
To reprod on macOS, force fork:
mp.set_start_method("fork")
Workaround
Use forkserver or spawn start method instead of fork:
ctx = mp.get_context("forkserver") # or "spawn"
p = ctx.Process(target=child)OS:
macOS 26.2 (25C56)
DuckDB Package Version:
1.4.4
Python Version:
3.12
Full Name:
Mohammed Hussain
Affiliation:
Querio
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Not applicable - the reproduction does not require a data set
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration to reproduce the issue?
- Yes, I have