Parallel Scan with multiple processes #63

mkaruza · 2024-07-02T07:27:59Z

We can now start multiple worker process that can read relation blocks and write buffer to shared memory to be consumed by duckdb reader threads. Problem can be viewed as variant of multi producer / multi consumer where producer are responsible for releasing buffer after they are read.

JelteF · 2024-09-30T12:34:09Z

@mkaruza what's the deal with this? I guess it's not critical for 0.1.0. Given the amount of merge conflicts it has now, I'm think it probably makes the most sense to simply close this and maybe create an issue for it instead.

mkaruza · 2024-09-30T14:53:33Z

@JelteF agree not critical, but could be approach to get better performances of heap table scans (specifically to fetch table pages in parallel; currently there is bottleneck there as only one thread in process can request page fetching).

Y--

High level looks reasonable (and perf improvements are promising).
Wonder if we could implement communication between thread and worker through signals rather than busy-spin on atomic variables, and if this would improve perfs.

Y-- · 2024-11-05T13:05:15Z

src/scan/heap_reader_worker.cpp

+			LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+			/* is previous buffer done */
+			while (!pg_atomic_unlocked_test_flag(&thread_worker_shared_state->buffer_ready)) {


Did you try adding a very small "sleep" in the loop rather than busy-spinning? I wonder if it would help performances.
Also could we use signals instead?
Should we have a timeout to handle situation where thread dies without ever resetting its thread_running or buffer_ready flags?

Y-- · 2024-11-05T13:08:36Z

src/scan/heap_reader_worker.cpp

+
+	if (thread_running) {
+		/* We are out of blocks fo reading so wait for last buffer to be done */
+		while (!pg_atomic_unlocked_test_flag(&thread_worker_shared_state->buffer_ready)) {


I guess same apply here - and we could factorize the two loops in a "wait" function?

Y-- · 2024-11-05T13:11:57Z

src/scan/heap_reader.cpp

+	}
+
+	/* Is buffer ready for reading */
+	while (pg_atomic_unlocked_test_flag(&m_thread_worker_shared_state->buffer_ready)) {


I wonder if we could use signals here rather than busy-spinning?

JelteF · 2024-11-05T13:29:47Z

test/regression/regression.conf

@@ -1,6 +1,6 @@
 # Configuration

-shared_preload_libraries = 'pg_duckdb'
+shared_preload_libraries = 'pg_duckdb.so'


This shouldn't be necessary, and actually probably breaks OSX

Doing parallel thread seq scan on heap table is slower than on single thread and that is because of need for global lock that needs to be taken after each fetching of buffer, checking tuple visibility. To speed up execution, we start parallel worker dedicated for thread that will fetch buffer and pass them to thread. Thread works with this page directly and once scan for buffer is done worker will relase it. HeapTupleSatisfiesVisibility call is also problematic because on some situations it will try to use SetHintBits on same page and that requires to have lock on page (which is not true for thread). For this purpose HeapTupleSatisfiesVisibilityNoHintBits was added which has same logic but doesn't use SetHintBits. Preliminatory testing showed that there is small difference between 3,4,.. parallel works so to simplfy logic we use hardcoded rule that will spawn 1 parallel process if number of blocks in thread is bigger than 2024 and if bigger 2 parallel workers (threads) will be created.

JelteF · 2024-12-09T09:47:59Z

@mkaruza do you want to close this one in favor of #477

mkaruza · 2024-12-09T15:59:22Z

Closing

mkaruza force-pushed the parallel-scan branch from f02e558 to 2a8c461 Compare July 2, 2024 07:41

mkaruza marked this pull request as draft July 3, 2024 09:16

Base automatically changed from index-scan to main July 3, 2024 16:37

JelteF added the enhancement New feature or request label Sep 30, 2024

JelteF added performance We need more speed and removed enhancement New feature or request labels Sep 30, 2024

JelteF added this to the 0.2.0 milestone Sep 30, 2024

mkaruza force-pushed the parallel-scan branch from 2a8c461 to 48b9ae6 Compare November 4, 2024 11:44

mkaruza requested review from JelteF and Y-- and removed request for JelteF November 4, 2024 11:45

Y-- reviewed Nov 5, 2024

View reviewed changes

JelteF reviewed Nov 5, 2024

View reviewed changes

mkaruza added 2 commits November 11, 2024 09:12

Synchronize using semaphores

59f781d

mkaruza force-pushed the parallel-scan branch from 48b9ae6 to 59f781d Compare November 12, 2024 10:35

mkaruza closed this Dec 9, 2024

mkaruza deleted the parallel-scan branch December 14, 2024 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Scan with multiple processes #63

Parallel Scan with multiple processes #63

mkaruza commented Jul 2, 2024

JelteF commented Sep 30, 2024

mkaruza commented Sep 30, 2024

Y-- left a comment

Y-- Nov 5, 2024

Y-- Nov 5, 2024

Y-- Nov 5, 2024

JelteF Nov 5, 2024

JelteF commented Dec 9, 2024

mkaruza commented Dec 9, 2024

Parallel Scan with multiple processes #63

Parallel Scan with multiple processes #63

Conversation

mkaruza commented Jul 2, 2024

JelteF commented Sep 30, 2024

mkaruza commented Sep 30, 2024

Y-- left a comment

Choose a reason for hiding this comment

Y-- Nov 5, 2024

Choose a reason for hiding this comment

Y-- Nov 5, 2024

Choose a reason for hiding this comment

Y-- Nov 5, 2024

Choose a reason for hiding this comment

JelteF Nov 5, 2024

Choose a reason for hiding this comment

JelteF commented Dec 9, 2024

mkaruza commented Dec 9, 2024