Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgapp:with_transaction() does not work as expected #22

Open
lukyanov opened this issue Aug 24, 2017 · 17 comments
Open

pgapp:with_transaction() does not work as expected #22

lukyanov opened this issue Aug 24, 2017 · 17 comments

Comments

@lukyanov
Copy link

The README file contains this example:

    pgapp:with_transaction(fun() ->
                                 pgapp:squery("update ..."),
                                 pgapp:squery("delete from ..."),
                                 pgapp:equery("select ? from ?", ["*", Table])
                           end).

The problem in this approach is that "BEGIN" and "COMMIT" actually happens in one connection, but all the queries inside happens in another.

Look at the following code of pgapp_worker.erl:

equery(PoolName, Sql, Params, Timeout) ->
    middle_man_transaction(PoolName,
                           fun (W) ->
                                   gen_server:call(W, {equery, Sql, Params},
                                                   Timeout)
                           end, Timeout).
...
with_transaction(PoolName, Fun, Timeout) ->
    middle_man_transaction(PoolName,
                           fun (W) ->
                                   gen_server:call(W, {transaction, Fun},
                                                   Timeout)
                           end, Timeout).

middle_man_transaction() will dedicate a separate poolboy worker for each query, including a transaction itself. So when with_transaction() is called, it gets a poolboy worker and runs "BEGIN" in it (inside epsql.erl). Thus the worker is busy waiting for the queries to be executed. But the queries run as separate calls to poolboy which means they get different workers.

@lukyanov
Copy link
Author

Ah, now I see. There is a trick:

squery(Sql) ->
    case get(?STATE_VAR) of
        undefined ->
            squery(epgsql_pool, Sql);
        Conn ->
            epgsql:squery(Conn, Sql)
    end.

But the trick only does its work when you do not specify PoolName explicitly.
So this works:

    pgapp:with_transaction(fun() ->
                                 pgapp:squery("update ..."),
                                 pgapp:squery("delete from ..."),
                                 pgapp:equery("select ? from ?", ["*", Table])
                           end).

But this doesn't:

    pgapp:with_transaction(pool1, fun() ->
                                 pgapp:squery(pool1, "update ..."),
                                 pgapp:squery(pool1, "delete from ..."),
                                 pgapp:equery(pool1, "select ? from ?", ["*", Table])
                           end).

@davidw
Copy link
Member

davidw commented Aug 24, 2017

I think @edhollandAL wrote some of this code - have time to take a look?

@lukyanov
Copy link
Author

Just made a fix for the issue. Please have a look.

@davidw
Copy link
Member

davidw commented Aug 24, 2017

Thinking out loud: rather than using the process dictionary, is there a way to, say, pass the Conn into the fun() or something?

@danilagamma
Copy link

@lukyanov it's intended behaviour, You shouldn't be able to specify Pool (and Timeout) in squery/equery functions that are running inside with_transaction, so yes, in your second example:

    pgapp:with_transaction(pool1, fun() ->
                                 pgapp:squery(pool1, "update ..."),
                                 pgapp:squery(pool1, "delete from ..."),
                                 pgapp:equery(pool1, "select ? from ?", ["*", Table])
                           end).

Functions inside fun() -> ... end will be executed outside of transaction scope.

I agree that README is not comprehensive enough and I suggest that it should be fixed instead of current behaviour.

@lukyanov
Copy link
Author

lukyanov commented Aug 25, 2017

Ok, I see now. But then I would agree with @davidw that passing Conn would be a better approach here. One of the problems in the current approach may be that it is not possible to have nested transactions.

doing_stuff1/0 below would not work as a transaction:

doing_stuff1() ->
    pgapp:with_transaction(pool1, fun() ->
                                 pgapp:squery("update ..."),
                                 doing_stuff2()
                           end).

doing_stuff2() ->
    pgapp:with_transaction(pool1, fun() ->
                                 pgapp:squery("update ..."),
                           end).

You will need explicit communication about the connection you are using.

@lukyanov
Copy link
Author

New variant of the changes: #23

@edholland
Copy link

@lukyanov: postgres does not have true subtransactions feature - so the example code you show would not work as intended anyway

@davidw
Copy link
Member

davidw commented Aug 25, 2017

I like the new code without the process dictionary - it feels cleaner in any case. Other thoughts?

@lukyanov
Copy link
Author

lukyanov commented Aug 27, 2017

I made another PR which is an alternative for #23: #24

@davidw While I agree with you that operating directly with the connection looks clearer and more comprehensible, using kind of 'dirty' approach with process dictionary has the following advantages:

  1. It does not break backwards compatibility of the library (otherwise all who are using pgapp would need to rewrite there code around with_transaction calls when updated).

  2. The usage of with_transaction is more handful when you are not thinking in terms of connections. It hides unnecessary details. Imagine the following code:

    register_user() ->
       pgapp:with_transaction(fun() ->
             users:create_user(),
             stats:update_stats()
       end).

    users:create_user() and stats:update_stats() may be independent functions which, in a different context, are called separately. Here they need to be called together inside a transaction. If you are forced to work with the connection directly, you would need to pass it to those functions, which may break the abstraction you intended to have when creating the functions.

@edholland
Copy link

edholland commented Aug 27, 2017

Thanks for the updated PR @lukyanov, I agree that the benefits you cite above are important and should be maintained. The register_user example makes a compelling case for supporting the pseudo-nested transactions. I'm less clear on the use case for supporting the cross-connection transactions.

If we allow this it is possible (likely) a user will attempt to create some sort of distributed transaction mechanism on top of pgapp; without proper two phase commit (which is only available as postgres extension) any such implementation would bound to be unreliable. I'm not sure that's something the project should be willing to support

@lukyanov
Copy link
Author

lukyanov commented Aug 27, 2017

@edholland Thanks for the response!
Let me be more clear on the second point which you mention as cross-connection transactions.
My point here is not to support complex multi-connection transactions. It is rather to make pgapp aware of the same connection inside the with_transaction callback. When you call with_transaction on the pool pool1 and use pool1 inside the callback, I want pgapp to assume that your intention is to use the same connection.

By the way, the example with register_user is also very good to support my point if we consider the app which uses multiple connection pools. Let me extend the example:

register_user() ->
   pgapp:with_transaction(pool1, fun() ->
         users:create_user(),
         stats:update_stats()
   end).

% module users.erl
create_user() ->
   pgapp:with_transaction(pool1, fun() ->
      pgapp:squery("insert ..."),
      pgapp:squery("update ...")
   end).

% module stats.erl
update_stats() ->
   pgapp:squery(pool1, "update ...").

So, my app uses different databases so I need a connection pool for each database. This means I must specify a pool I want to use every time I do a query or start a transaction. In the example above all the queries are for pool1. Here is the point. users:create_user() and stats:update_stats() are dedicated functions which may be called separately in a different context. So we cannot omit pool1 as an argument in pgapp:with_transaction and pgapp:squery. But here they are called within a single transaction. The equivalent query sequence would look like this:

register_user() ->
   pgapp:with_transaction(pool1, fun() ->
      pgapp:squery("insert ..."),
      pgapp:squery("update ...")
      pgapp:squery(pool1, "update ...").
   end).

In the current pgapp code this approach won't work because the last query contains pool1 as a first argument. pgapp will use a separate poolboy worker and the transaction will only contain first two queries.

P.S.

If we allow this it is possible (likely) a user will attempt to create some sort of distributed transaction mechanism on top of pgapp

We cannot stop the user doing something like this anyway, even in the current pgapp version. Anyone is and was able to specify a pool name inside the callback.

@edholland
Copy link

In your examples you're using pool1 for all queries, if this is behaviour required then you can just omit the poolname. Assuming that the inner pool should be pool2, as below, then it's possible that the update on pool2 succeeded while the sql on pool1 is rollback. We currently don't support this behaviour by serialising all calls within a transaction to a single connection

register_user() ->
   pgapp:with_transaction(pool1, fun() ->
      pgapp:squery("insert ..."),
      pgapp:squery("update ...")
      pgapp:squery(pool2, "update ...").
   end).

@lukyanov
Copy link
Author

@edholland For the same reason it is ok to allow to specify timeout inside the callback. It must be ignored by pgapp, that's true, but it should not be prohibited as the same function may work differently depending on whether it was called inside the transaction or as standalone.

@lukyanov
Copy link
Author

lukyanov commented Aug 27, 2017

@edholland My example is correct. All queries are using pool1. That is the intention. And I cannot ommit pool1, because update_stats() is a standalone function in a different module.

@davidw
Copy link
Member

davidw commented Aug 28, 2017

I'll have to think about this some, but I do think that, given the small number of users, if we need to change the API to improve the code, it's best to do so.

@tsloughter
Copy link

Couldn't this issue be simplified by not running the queries in a worker? Instead use the pool only to checkout a connection, use it and return it? This would also help solve #20 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants