Provide a jsonSchema when adding a mongodb catalog so schema won't be inferred incorrectly #20394
Unanswered
RoseGoldIsntGay
asked this question in
Q&A
Replies: 1 comment
-
Honestly, it'd also just be interesting to be able to tell Trino to not only take one sample document but rather choose multiple or even all documents in order to build it's schema. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Continuing the discussion on slack:
As of today Trino infers the schema for a MongoDB collection using it's first document, as seen here:
https://github.com/trinodb/trino/blob/master/plugin/trino-mongodb/src/main/java/io/trino/plugin/mongodb/MongoSession.java#L789
This causes an issue when multiple documents do not contain the same schema, as you will not be able to query columns that exist in another document. It's possible to solve this issue by manually updating the
_schema
collection to match all documents in the collection, however this process involves using a non-native column format that is incredibly hard to work with. for example let's say my_schema
collection looks like this (redundant data truncated):Now, the size of the first item in my
inventory
collection is two-dimensional as it's a piece of paper, and therefore doesn't have adepth
field, however, the second item is three-dimensional and therefore requires the_schema
to be:Now, I have a
jsonSchema
file for my collection that looks like this:there is no "comfortable" way of converting a
jsonSchema
intro trino'srow(..., ...)
data typefor the long term, I suggest adding a property to the MongoDB Trino plugin that will allow the user to insert a jsonSchema per-collection - and the
_schema
document for said collection will be created from itIs there any other solution that I can implement right now to solve this issue?
Beta Was this translation helpful? Give feedback.
All reactions