Refactor Postgres type resolution #1797

demurgos · 2022-04-13T15:59:00Z

Hello!
I am opening an issue to discuss Postgres type resolution. I submitted some PRs related to this code, but these were incremental changes. I'd like to discuss a larger refactoring to address some of the existing issues.

Issues

This section describes some of the issues I see in the current implementation.

Expose the type cache

sqlx/sqlx-core/src/postgres/connection/describe.rs

Lines 176 to 181 in a46e370

    
           // we are not in a place that *can* run a query 
        
           // this generally means we are in the middle of another query 
        
           // this _should_ only happen for complex types sent through the TEXT protocol 
        
           // we're open to ideas to correct this.. but it'd probably be more efficient to figure 
        
           // out a way to "prime" the type cache for connections rather than make this 
        
           // fallback work correctly for complex user-defined types for the TEXT protocol

Currently, the type cache is fully managed by SQLx. It would be better if the user had a bit more control over it so it could be possible to preload types in the cache manually. We could also imagine building the cache with a single connection and then moving it to the pool so new connections are created with a populated cache.

Support self-referential types

The current type resolution code goes into an infinite loop when trying to resolve self-referential types (such as a custom recoded representing a linked-list node).

Statically ensure types are resolved before using them

This is the problem that motivated me to open this discussion. I want to fix #1672, the bug there is that a PgType is lazy and may either be an unresolved declaration or a resolved definition. The current representation makes it hard to statically ensure that a type is fully resolve before using it to read/write data.

Proposal

My idea would be to expose a PgTypeRegistry representing the retrieved information about the type-definition graph of Postgres. This would model the types as Postgres: with leaf nodes for primitives, and composite types represented with links to the inner types.

With this graph model, it is possible to explicitly insert some nodes, handle self-referential types, check if a type is partially or fully resolved. Once a type is fully resolved, it would be possible to return a cursor guaranteeing the invariant that traversing the type will never fail due to unresolved metadata.

Questions

My current approach is to ignore backward compatibility to quickly propose a PR with focus on providing a clean solution for Postgres. For example, I want to split the PgType enum to avoid mixing variants representing declarations (DeclareWithName/DeclareWithOid) with resolved types.

I know that SQLx 0.6 is planned so I assume this kind of breaking may be acceptable as long as there is a clear and easy migration path. Are there some concerns / comments regarding this approach?

I am also focusing on Postgres because it's the backend I use and I am familiar with the SQLx code handling it. Is it okay if the first version targets Postgres only or should I strive for a more general solution?

Finally, should I work on the master branch or another branch? I started working on master.

The text was updated successfully, but these errors were encountered:

abonander · 2022-04-13T19:19:39Z

The big issue with exposing an interface based on type OIDs is that they're not stable except for types that are in the default catalog. Otherwise, it's literally just an integer that's incremented on insert to pg_type whenever the extension is loaded or the CREATE TYPE statement is evaluated. That's why we dynamically resolve type names in the first place.

As for recursive definitions, we probably should implement some sort of cycle detection. I'm actually surprised we bother recursing at all but that's something @mehcode worked on, not me.

I could see some sort of "make sure this type resolves or die early" call that you could add to PoolOptions::after_connect() or maybe just make it part of PgConnectOptions. Maybe wrap it in a macro where you can just pass a list of types.

demurgos · 2022-04-13T19:46:44Z

I agree that the entry point for resolution should be a name and that SQLx should handle the resolution with the DB. I don´t think we should let the user manually associate an oid with typeinfo as it may be invalid due to the issues you described. Or this API should be clearly marked as a very low level one.

Having a more self-contained way to ensure full resolution (either explicitly or automatically) is what would help fixing #1672 IMO. Having a way to ask for full resolution given a type name is definitely something I'd like to support.

Exposing the local type registry / cache would also allow the user to clear the cache, so edge cases where types are deleted or pg_type is modified directly can be supported. Currently there may be issues when the schema is reset and stale oid are used for custom types as input for prepared statements.

abonander · 2022-04-13T19:49:47Z

I think to the last point, we probably could add something to Pool that closes all connections older than the current timestamp when they're returned. That could help with a bunch of potential issues.

Postgres arrays and records do not fully support custom types. When encountering an unknown OID, they currently default to using `PgTypeInfo::with_oid`. This is invalid as it breaks the invariant that decoding only uses resolved types, leading to panics. This commit returns an error instead of panicking. This is merely a mitigation: a proper fix would actually add full support for custom Postgres types. Full support involves more work, so it may still be useful to fix this immediate issue. Related issues: - launchbadge#1672 - launchbadge#1797

Postgres arrays and records do not fully support custom types. When encountering an unknown OID, they currently default to using `PgTypeInfo::with_oid`. This is invalid as it breaks the invariant that decoding only uses resolved types, leading to panics. This commit returns an error instead of panicking. This is merely a mitigation: a proper fix would actually add full support for custom Postgres types. Full support involves more work, so it may still be useful to fix this immediate issue. Related issues: - #1672 - #1797

demurgos mentioned this issue Apr 13, 2022

Can't decode PostgreSQL arrays of custom types #1672

Open

demurgos mentioned this issue Apr 15, 2022

Refactor Postgres types, guarantee resolution before usage #1810

Closed

demurgos mentioned this issue May 4, 2022

Fix panics on unknown Postgres type oid when decoding #1855

Merged

CommanderStorm mentioned this issue Nov 28, 2024

Performance Impact: Allow For Disabling SQLx Type Checking Queries, As They Add Significant Latency #3611

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Postgres type resolution #1797

Refactor Postgres type resolution #1797

demurgos commented Apr 13, 2022 •

edited

Loading

abonander commented Apr 13, 2022

demurgos commented Apr 13, 2022 •

edited

Loading

abonander commented Apr 13, 2022

Refactor Postgres type resolution #1797

Refactor Postgres type resolution #1797

Comments

demurgos commented Apr 13, 2022 • edited Loading

Issues

Expose the type cache

Support self-referential types

Statically ensure types are resolved before using them

Proposal

Questions

abonander commented Apr 13, 2022

demurgos commented Apr 13, 2022 • edited Loading

abonander commented Apr 13, 2022

demurgos commented Apr 13, 2022 •

edited

Loading

demurgos commented Apr 13, 2022 •

edited

Loading