-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NioLoop to share Selector (typical use case is HTTP selector) #18
Comments
As per PR #19 providing the following write up on my shallow understanding of ADBA but thoughts on driver architecture to get high performance through put. I see the architecture of ADBA at two levels:
Keeping these separate actually allows better focus on the problems, as network performance is all about keeping the socket buffers full for data out going and keeping the socket buffer clear for incoming data. I can talk you through the analogy with lanes of traffic I use regarding network performance (from OfficeFloor you can probably see I'm big on analogies - and why opening more lanes/connections of traffic does not help in throughput performance trying to get to the singleton city/database). Anyway, the top level is all about handling callbacks and actioning the next "submission" within callbacks. This to me is a different coding paradigm to getting ByteBuffers full to get them over the network and pulling data off socket buffers to keep responses coming back. From a visual point of view, I see it the following:
Then (order representing data coming back):
I see it the responsibility of the Connection to order the submissions. However, like JDBC the Connection is thread safe if threads re-use it to run a Statement.... hence the ADBA submissions could be sent very much one after each other as the application progresses through the ADBA layer promises (potentially with many submissions underway in parallel - though at network layer sequentially by above flow). Dependencies between the submissions should be managed at the Connection level (not the protocol level, which is about getting "SQL" to the database and results back) To achieve this, I would like to consider some refactoring to the ProtocolV3 layer to be something along the lines (at very high level and broad stroke level):
This will make the architecture a lot more modular with submissions being self contained (including the data required to be send and received for that submission). The only issue is that if submissions require chatty network communication (e.g. authentication handshake). I don't know the PostgreSql wire format too well, but I'm thinking this may be the only case of a submission requiring multiple messages back and forth. Note: I would expect a request for further data in a "ResultSet" to be another submission at the Network Layer. And possibly re-using the term "submission" from ADBA may be misleading. I don't know a better name at this point but will go with Network Submission. If, however, there is further cases of "submission chattiness", we can flag these at the Network Layer with some "halting further submissions" from being sent "flag" until that submission is complete. I'm thinking this would likely be adequate for the more complicated scenarios (given the ADBA top level "promise" layer is responsible for the more complicated callback triggers and further submissions etc). Anyway, for these I would be interested to work through them case by case (but I'm hoping they are a small number of edge cases - but happy to be corrected on this). Anyway, this is my thoughts (and experience of writing high performance HTTP server) being applied to my (and I admit) very shallow understanding of ADBA specification. Happy to discuss more and bounce ideas around. Given the changes, I did not want to go to much further without at least getting some discussion going on the ideas :) |
@sagenschneider couple of things off the top of my head. (I'm going to have to look at your PR to understand it more fully) The connection in JDBC is not threadsafe. The only way I can think of to deal with this currently is to have a number of connections and multiplex the submissions across the open connections. |
@davecramer Connection is thread safe if creating a statement per thread
Basically, the following will run fine: Connection conn = ...;
for (int i = 0; i < 100; i++) {
new Thread(() -> conn.createStatement().execute("SELECT * FROM TEST").start();
} Unfortunately, what I find is there is a synchronised method in drivers that basically blocks to only do one request at a time (so the above does perform poorly compared to a thread pool of connections). My understanding of PostgreSql (though again shallow without inspecting the code) is that it creates a process per connection. This is evident from reading the tuning guides (about memory size to specify against number of connections). Therefore, servicing a connection is by a single process that I'm guessing is not multi-threaded to process requests in parallel. My guess is that it would sequentially process the requests as they come in over the wire and send responses back as the process gets it (basically maintaining order - allowing pipelining). Again, this is unconfirmed however reading https://github.com/reactiverse/reactive-pg-client it claims half way down the list "command pipelining" (and likely reason it scores 150K times faster than the next driver - https://www.techempower.com/benchmarks/#section=data-r16&hw=ph&test=db ). Plus there have been discussions in the TechEmpower forums about allowing batching given Vertx (reactive-pg-client) pipelines (sadly you can't beat the through put of a flow of traffic when "your bridge only allows you to drive one car over it at a time"). In regards to session states in the connection, that is the responsibility of the ADBA "promise" layer to organise the order. Basically, for me asynchronous code is functional coding (not OO). Unless a dependency is created between the functions (callbacks) then the compiler is free to execute the methods as it sees fit. Hence the following code would be indeterminate about whether the update occurs within the transaction or not (excuse if syntax slightly off from ADBA): connection.beginTransaction().submit().outOperation("SELECT * FROM TEST").commit().submit(); Ideally, there might be an implicit then relationship created. However, if not... then establishing the transaction takes time and the Update statement may actually get into the transaction due to delays in callbacks actioning the next item in each chain. This is an inherent problem with asynchronous I/O and why it can actually get very hairy trying to debug (and possibly where ADBA will have a lot of troubles - in managing data changes). For me, when it comes to updating and transactions, JPA for ease of development wins hands down. JPA, unfortunately, expects blocking I/O. It is this reason I also smile when I here about JavaScript being great for REST applications. I always ask to see the relational database examples - seems to always be missing... but works great with NoSql solutions that have no transaction support ;) For me a typical application in the future might be:
So, I totally agree with you about state of the connection. But I don't believe that is a network problem. It is more a problem at the connection level to ensure chains of callbacks keep the connection in the appropriate state for operations. |
Well just because Oracle says it's threadsafe doesn't make it so.... As you found out Drivers will synchronize requests. As for PostgreSQL, yes it does create a process per connection, and many things are local to that session. Temporary tables, schemas, transaction, many user configurations...
To see all of the settings that can be changed in a session.
if the above allows the update to occur in the transaction then I fail to see how this is useful for anything other than a read only database ? I don't want to hinder this developmen, but I would like to see people think about this rationally, hence my bringing up some issues Cheers |
@davecramer yes, good question and one I have regarding the usefulness of ADBA for updating (especially as per my explanation of JPA advantages in #19 ). The above example, however, is basically poor code by the application developer. One key example of ADBA over JDBC is possibly in faster loading data. Take for example the following: var transaction = connection.beginTransaction().submit();
List<Submissions> inserts = new ArrayList<Submission>();
for (int i = 0; i < 100; i++) {
inserts.add(transaction.insertOperation("INSERT INTO TEST ( ID ) VALUES (" + i + ")").submit())
}
waitAll(inserts).commit().submit(); Note: the inserts is just one table. However, effectively you could chain together a lot of tables to have data inserted and allow them all to start firing off in parallel (as there is no blocking then between the insert chains). Then once all chains of inserts are complete, you complete the transaction. What I'm trying to say is keep the network layer and "network submissions" separate from the chaining of submissions within the Connection. The protocol/network layer is focused on efficient throughput of the submissions from the Connection layer to the database. The Connection layer is responsible for appropriate state transitions in it's chained handling. |
Just a short answer as I'm on mobile. But I feel that there is some
missunderstanding of the threading model that adba has.
As you have said there is a top level (the Connection) and a network layer,
but the connection is not threadsafe. It is async, so it will be very fast
to create Operation objects and stack them into the que that the network
layer reads from.
Regarding reordering, I think there is leeway in the standard for that,
with the parallell() call. But as the postgres server/wire format doesn't
support parallell queries i don't think there is that much performance to
be gained from it.
Den tor 2 aug. 2018 18:34Daniel <[email protected]> skrev:
… @davecramer <https://github.com/davecramer> yes, good question and one I
have regarding the usefulness of ADBA for updating (especially as per my
explanation of JPA advantages in #19
<#19> ).
The above example, however, is basically poor code by the application
developer.
One key example of ADBA over JDBC is possibly in faster loading data. Take
for example the following:
var transaction = connection.beginTransaction().submit();
List<Submissions> inserts = new ArrayList<Submission>();
for (int i = 0; i < 100; i++) {
inserts.add(transaction.insertOperation("INSERT INTO TEST ( ID ) VALUES (" + i + ")").submit())
}
waitAll(inserts).commit().submit();
Note: the inserts is just one table. However, effectively you could chain
together a lot of tables to have data inserted and allow them all to start
firing off in parallel (as there is no blocking then between the insert
chains). Then once all chains of inserts are complete, you complete the
transaction.
What I'm trying to say is keep the network layer and "network submissions"
separate from the chaining of submissions within the Connection. The
protocol/network layer is focused on efficient throughput of the
submissions from the Connection layer to the database. The Connection layer
is responsible for appropriate state transitions in it's chained handling.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAniHpIOunQXjjctohSSPuTuvesiX1Igks5uMyoqgaJpZM4Vp_hz>
.
|
What the pgjdbc driver does with your example above is the inserts are transformed to insert into foo (id) values (1),(2),(3), (4), ... (n) See pgjdbc/pgjdbc#491 for some more information. I'd like to see if your method is actually faster though... |
@alexanderkjall yes, async and threading are not similar. OfficeFloor actually provides a multi-threaded event loop (and it was not easy to write). Most asynchronous event loops are singled threaded - your GUI being a perfect example of asynchronous event loop :) NodeJS being another. Please excuse me if bringing up threading is confusing the matter regarding chaining asynchronous operations together to improve getting them over the wire to the database server. Totally agreeing with what you are saying 👍 @davecramer yes for this example it might do that, however a more realist example might be: var transaction = connection.beginTransaction().submit();
List<Submissions> inserts = new ArrayList<Submission>();
for (int i = 0; i < 100; i++) {
var user = transaction.insertOperation("INSERT INTO USER ( ID ) VALUES (" + i + ")"));
inserts.add(user.insertOperation("INSERT INTO ADDRESS ( USER_ID, ADDRESS) VALUES ( ?, '1 Acme Way')").param(1, user.subscribe("ID")));
inserts.add(user.insertOperation("INSERT INTO PAY ( USER_ID, AMOUNT) VALLUES ( ?, 'peanuts')").param(1, user.subscribe("ID)));
}
waitAll(inserts).commit().submit(); Basically, all the inserts are chained so order is maintained for adhering to foreign keys. However, throughput is increased as inserts can effectively be fired asynchronously in parallel. |
Rather than trying to code in the threading models and Selectors for a DataSource and it's Connections, allow them to be configured externally.
A primary use case for this is sharing the HTTP Selector. By allowing the NIO loop to be provided to the DataSource for it's connections it allows the HTTP Selector and threads to be used to service the communication with the database. This will allow a single thread to service HTTP requests, which is my understanding of the aim of NIO.
This is further work on #12 .
Example code (not yet working) available to demonstrate the concept:
https://github.com/sagenschneider/pgsql2/blob/281b7fcecd8c54a8041680d654ea9de9404427ff/src/main/java/org/postgresql/sql2/PGDataSource.java#L45 (shows how the NioLoop can be provided to the DataSource with a reasonable default)
https://github.com/sagenschneider/pgsql2/blob/281b7fcecd8c54a8041680d654ea9de9404427ff/src/main/java/org/postgresql/sql2/execution/NioLoop.java#L1 (NioLoop that may be implemented by the web server to share it's HTTP Selector with the DataSource)
https://github.com/sagenschneider/pgsql2/blob/281b7fcecd8c54a8041680d654ea9de9404427ff/src/main/java/org/postgresql/sql2/communication/ProtocolV3.java#L65 (using NioLoop for communication to database)
The text was updated successfully, but these errors were encountered: