Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Postgres type resolution #1797

Open
demurgos opened this issue Apr 13, 2022 · 3 comments
Open

Refactor Postgres type resolution #1797

demurgos opened this issue Apr 13, 2022 · 3 comments

Comments

@demurgos
Copy link
Contributor

demurgos commented Apr 13, 2022

Hello!
I am opening an issue to discuss Postgres type resolution. I submitted some PRs related to this code, but these were incremental changes. I'd like to discuss a larger refactoring to address some of the existing issues.

Issues

This section describes some of the issues I see in the current implementation.

Expose the type cache

// we are not in a place that *can* run a query
// this generally means we are in the middle of another query
// this _should_ only happen for complex types sent through the TEXT protocol
// we're open to ideas to correct this.. but it'd probably be more efficient to figure
// out a way to "prime" the type cache for connections rather than make this
// fallback work correctly for complex user-defined types for the TEXT protocol

Currently, the type cache is fully managed by SQLx. It would be better if the user had a bit more control over it so it could be possible to preload types in the cache manually. We could also imagine building the cache with a single connection and then moving it to the pool so new connections are created with a populated cache.

Support self-referential types

The current type resolution code goes into an infinite loop when trying to resolve self-referential types (such as a custom recoded representing a linked-list node).

Statically ensure types are resolved before using them

This is the problem that motivated me to open this discussion. I want to fix #1672, the bug there is that a PgType is lazy and may either be an unresolved declaration or a resolved definition. The current representation makes it hard to statically ensure that a type is fully resolve before using it to read/write data.

Proposal

My idea would be to expose a PgTypeRegistry representing the retrieved information about the type-definition graph of Postgres. This would model the types as Postgres: with leaf nodes for primitives, and composite types represented with links to the inner types.

With this graph model, it is possible to explicitly insert some nodes, handle self-referential types, check if a type is partially or fully resolved. Once a type is fully resolved, it would be possible to return a cursor guaranteeing the invariant that traversing the type will never fail due to unresolved metadata.

Questions

My current approach is to ignore backward compatibility to quickly propose a PR with focus on providing a clean solution for Postgres. For example, I want to split the PgType enum to avoid mixing variants representing declarations (DeclareWithName/DeclareWithOid) with resolved types.

I know that SQLx 0.6 is planned so I assume this kind of breaking may be acceptable as long as there is a clear and easy migration path. Are there some concerns / comments regarding this approach?

I am also focusing on Postgres because it's the backend I use and I am familiar with the SQLx code handling it. Is it okay if the first version targets Postgres only or should I strive for a more general solution?

Finally, should I work on the master branch or another branch? I started working on master.

@abonander
Copy link
Collaborator

The big issue with exposing an interface based on type OIDs is that they're not stable except for types that are in the default catalog. Otherwise, it's literally just an integer that's incremented on insert to pg_type whenever the extension is loaded or the CREATE TYPE statement is evaluated. That's why we dynamically resolve type names in the first place.

As for recursive definitions, we probably should implement some sort of cycle detection. I'm actually surprised we bother recursing at all but that's something @mehcode worked on, not me.

I could see some sort of "make sure this type resolves or die early" call that you could add to PoolOptions::after_connect() or maybe just make it part of PgConnectOptions. Maybe wrap it in a macro where you can just pass a list of types.

@demurgos
Copy link
Contributor Author

demurgos commented Apr 13, 2022

I agree that the entry point for resolution should be a name and that SQLx should handle the resolution with the DB. I don´t think we should let the user manually associate an oid with typeinfo as it may be invalid due to the issues you described. Or this API should be clearly marked as a very low level one.

Having a more self-contained way to ensure full resolution (either explicitly or automatically) is what would help fixing #1672 IMO. Having a way to ask for full resolution given a type name is definitely something I'd like to support.

Exposing the local type registry / cache would also allow the user to clear the cache, so edge cases where types are deleted or pg_type is modified directly can be supported. Currently there may be issues when the schema is reset and stale oid are used for custom types as input for prepared statements.

@abonander
Copy link
Collaborator

I think to the last point, we probably could add something to Pool that closes all connections older than the current timestamp when they're returned. That could help with a bunch of potential issues.

demurgos added a commit to demurgos/sqlx that referenced this issue May 4, 2022
Postgres arrays and records do not fully support custom types. When encountering an unknown OID, they currently default to using `PgTypeInfo::with_oid`. This is invalid as it breaks the invariant that decoding only uses resolved types, leading to panics.

This commit returns an error instead of panicking. This is merely a mitigation: a proper fix would actually add full support for custom Postgres types. Full support involves more work, so it may still be useful to fix this immediate issue.

Related issues:
- launchbadge#1672
- launchbadge#1797
demurgos added a commit to demurgos/sqlx that referenced this issue May 5, 2022
Postgres arrays and records do not fully support custom types. When encountering an unknown OID, they currently default to using `PgTypeInfo::with_oid`. This is invalid as it breaks the invariant that decoding only uses resolved types, leading to panics.

This commit returns an error instead of panicking. This is merely a mitigation: a proper fix would actually add full support for custom Postgres types. Full support involves more work, so it may still be useful to fix this immediate issue.

Related issues:
- launchbadge#1672
- launchbadge#1797
abonander pushed a commit that referenced this issue May 31, 2022
Postgres arrays and records do not fully support custom types. When encountering an unknown OID, they currently default to using `PgTypeInfo::with_oid`. This is invalid as it breaks the invariant that decoding only uses resolved types, leading to panics.

This commit returns an error instead of panicking. This is merely a mitigation: a proper fix would actually add full support for custom Postgres types. Full support involves more work, so it may still be useful to fix this immediate issue.

Related issues:
- #1672
- #1797
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants