Present experience and feature proposal for using pgx in a serverless architecture #1370
Replies: 2 comments
-
With regards to your GraphQL example: I'm not sure how much control you have over how it issues queries to PostgreSQL, but it would be far more efficient if it were to use a batch request or pipelining instead of using multiple connections. With regards to more efficient session resetting: DISCARD ALL would handle clearing the server side. Client side would require tracking the initial settings of the connection and exposing a method that would restore those settings. Another method could remove all prepared and cached statement data. I don't think it would be too difficult. This could be useful with RLS as well. |
Beta Was this translation helpful? Give feedback.
-
Hi Jack, thank you for joining the discussion (and for developing and maintaining I looked into your suggestions of using db, err := sql.Open("pgx", os.Getenv("DATABASE_URL"))
if err != nil {
log.Fatalf("failed opening database connection: %s", err)
}
conn, err := db.Conn(ctx)
if err != nil {
log.Fatalf("failed acquiring pgx connection from sql.DB: %s", err)
}
lambda.Start(func(ctx context.Context, event json.RawMessage) (res interface{}, err error) {
res, err = resolveAppSyncEvent(ctx, event)
if err != nil {
return nil, fmt.Errorf("some error: %w", err)
}
conn.Raw(func(driverConn any) error {
conn := driverConn.(*stdlib.Conn).Conn() // conn is a *pgx.Conn
// we cannot lookup the names of the prepared statements
for i := 1; i < 10; i++ {
if err := conn.Deallocate(ctx, "stmcache_" + strconv.FormatInt(i, 10)); err != nil {
break
}
}
return
})
return
}) I don't return the error of If you propose a method for deallocating all prepared statements, e.g., Regarding your first suggestion of batching: AWS AppSync indeed supports a batch invoke, which is something I haven't thought of but which would definitely be a huge performance boost! |
Beta Was this translation helpful? Give feedback.
-
In the present discussion, I would like to share our experience in using pgx in a serverless architecture. In particular, I describe the problem of connection scaling, session pinning and present strategies to address these challenges. FInally, I propose a new mode to pgx connection pooling, which has the potential to improve the developer experience of using pgx in a serverless environment and improve it's performance.
In the following, I refer to AWS services to describe the different components of an exemplary servleress architecture. However, the discussion is not limited to AWS and should generalize to alternative implementations.
Let us consider an architecture comprising AWS RDS Postgres, AWS RDS Proxy, AWS Lambda, and AWS AppSync.
AWS AppSync hosts a GraphQL server mapping a GraphQL requests to invocations of AWS Lambda.
AWS Lambda hosts a Go function, which uses
pgx
to connect to AWS Proxy for the initial "cold" start.AWS Proxy multiplexes the limited connections of the AWS RDS Postgres database (similar to pg_bouncer).
Let us now consider a GraphQL request asking for the users commenting on the most recent blog posts. Assuming, there are$N$ recent blog posts with $\leq M$ comments from $\leq J$ different users, AWS AppSync's parallel invocations of AWS Lambda can cause up to $NMJ$ parallel queries on Postgres. Given that a AWS RDS Postgres with a
t3.micro
instance has a predefined connection limit of 100, our setup would quickly yield connection limit errors without AWS RDS proxy in-between.But even with AWS RDS proxy in-between, many users will surprisingly experience connection limit (and timeout) errors because of session pinning. Session pinning occurs whenever a postgres client stores data temporary on the postgres server. For example, when the postgres client sends a prepared statement, the server stores the statement for the client. To keep the client data on the server separate, the server pins a session to the client connection. Whenever this occurs, RDS Proxy cannot use the connection to RDS Postgres for multiplexing.
One workaround, which has been proposed in the issues a couple of time where this problems happens with pg_bouncer is to disable prepared statements. In v5 of pgx, we can disable prepared statements by appending
default_query_exec_mode=exec
ordefault_query_exec_mode=simple_protocol
to the connection string (see also QueryExecMode for detail).However, disabling prepared statements makes us more vulnerable against SQL injections and complicates type mapping of the driver (we have seen quite some improvements on the type mapping when going from v4 to v5).
Another alternative is to create and close a connection during the lifecycle of an AWS Lambda invocation. In this case, we allow session pinning for the time of our Lambda invocation, hoping that the session is released fast enough before other invokations timeout. However, one drawback of this approach is that there is a substantial overhead per invocation to first establish the connection but to me it seems that connecting on every invocation is the best you can do at the moment.
In the future, it would be nice to have either some ability to clear the session, i.e., remove all session variables and cached statements on the server and client side, or, alterntively, have some one-way connection pool, which prepares a new "throw-away" connection before the invocation.
Beta Was this translation helpful? Give feedback.
All reactions