PgRowPublisherOperation should consider Subscriber demand #33

mp911de · 2018-12-04T11:58:00Z

PgRowPublisherOperation accepts Flow.Subscriber that requests a number of data signals n (typically rows) via Subscription.request(n). PgRowPublisherOperation should consider demand to:

Determine a reasonable fetch size (probably a good suggestion to introduce fetchSize(long rows) on RowPublisherOperation in ADBA spec)
Request only n rows from a cursor (when using cursored fetching)
Request nextn rows as soon as Subscriber requests additional rows

Typically, it's hard to fetch exactly n rows, but the point here is to fetch more or less n rows to satisfy demand and pre-buffer some additional rows (smart prefetching) to reduce latency between I/O and processing.

The text was updated successfully, but these errors were encountered:

alexanderkjall · 2018-12-04T12:00:14Z

Do you think there is a need to tune the fetch size per request, or could it be a SessionParameter?

mp911de · 2018-12-04T13:08:25Z

Both, assuming request maps to a statement execution and not to a single server req/resp exchange.

You want to be able to tune defaults for an application. Some queries require additional tuning (see JDBC's FetchSize) to optimize for e.g. large or small data sets.

alexanderkjall · 2018-12-08T19:10:20Z

I have thought some more on this:

Specifying a request limit will block the sending of other queries while this on is in flight, as adding a limit to the Execute message of the wire protocol puts the protocol in a "chatty" mode where you can't send and receive asynchronously, maybe not a bit problem but it might impact speed and will require some redesign of the network layer to expose this information to it.
The SubmissionPublisher class has an estimateMinimumDemand() function that looks like it would be suitable for this usecase.
There has been quite some debate about the need for backpressure on the jdbc-spec-discuss mailing list and my gut feeling is that there will be more feedback on that subject, the topic is quite complex.

Based on the above points I still thinks this is a valid feature request, but as it's nontrivial to implement and the specification might change again I will focus on lower hanging fruit for now.

mp911de · 2018-12-08T22:22:01Z

I'd like to elaborate a bit on your first point:

Specifying a request limit will block the sending of other queries while this one is in flight

That's exactly the case. No demand will stop traffic on the connection and unblock the connection once the command is either canceled or registers additional demand. That is basically what should happen on non-multiplexed connections. In the first moment, this seems an odd thought because: Why would one want this to happen?

If you do not share the connection but use it linearly (send a statement, consume results, send next statement, …), this pattern makes totally sense as you do not read more data at a time than necessary. Thus, you don't allocate more memory than needed (based on subscriber demand) to process incoming data – you leave all data outside your application/JVM.

If you do share the connection or even want to run interleaved queries, requirements might change a bit. I'm not entirely familiar with the Postgres wire protocol, but I know from older SQL server versions that a single transport connection was bound to single query execution and didn't allow for interleaved SQL batches (until MARS was implemented). In such a scenario, you would not want to prevent interleaved queries, and thus you want to unblock the connection (consume a response) to issue the next query while a cursor/result is not fully consumed. The price for this feature is paid with higher memory consumption as you need to consume a response and keep it in memory until its subscriber asks for more data.

At this point, a driver might want to consider the actual demand and allow for optimizations. You want to reduce the number of roundtrips while aligning the fetch size to the actual demand.

but as it's nontrivial to implement

I concur. This feature needs additional thought.

cretz · 2018-12-09T00:44:03Z

I'm not entirely familiar with the Postgres wire protocol, but I know from older SQL server versions that a single transport connection was bound to single query execution and didn't allow for interleaved SQL batches

This is the case with the Postgres protocol too, a connection cannot be multiplexed. See https://www.postgresql.org/docs/current/protocol.html.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PgRowPublisherOperation should consider Subscriber demand #33

PgRowPublisherOperation should consider Subscriber demand #33

mp911de commented Dec 4, 2018

alexanderkjall commented Dec 4, 2018

mp911de commented Dec 4, 2018

alexanderkjall commented Dec 8, 2018

mp911de commented Dec 8, 2018

cretz commented Dec 9, 2018

PgRowPublisherOperation should consider Subscriber demand #33

PgRowPublisherOperation should consider Subscriber demand #33

Comments

mp911de commented Dec 4, 2018

alexanderkjall commented Dec 4, 2018

mp911de commented Dec 4, 2018

alexanderkjall commented Dec 8, 2018

mp911de commented Dec 8, 2018

cretz commented Dec 9, 2018