-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reject CREATE TABLE/VIEW with duplicate column names #13517
base: main
Are you sure you want to change the base?
Conversation
DFSchema checks column for uniqueness, but allows duplicate column names when they are qualified differently. This is because DFSchema plays central role during query planning as a identifier resolution scope. Those checks in their current form should not be there, since they prevent execution of queries with duplicate column aliases, which is legal in SQL. But even with these checks present, they are not sufficient to ensure CREATE TABLE/VIEW is well structured. Table or view columns need to have unique names and there is no qualification involved. This commit adds necessary checks in CREATE TABLE/VIEW DDL structs, ensuring that CREATE TABLE/VIEW logical plans are valid in that regard.
57aba78
to
65e6c46
Compare
@@ -150,6 +150,11 @@ pub enum SchemaError { | |||
qualifier: Box<TableReference>, | |||
name: String, | |||
}, | |||
/// Schema duplicate qualified fields with duplicate unqualified names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably add a small text example here what is considered as duplicates?
/// Whether the table is an infinite streams | ||
pub unbounded: bool, | ||
/// Table(provider) specific options | ||
pub options: HashMap<String, String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can add comments what is a key and what is the value in the hashmaps?
])) | ||
)?) | ||
.unwrap_err().strip_backtrace().to_string(), | ||
"Schema error: Schema contains qualified fields with duplicate unqualified names t1.c1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I feel lower case col names are not easy to read in the error message. Will tweak the schema_err!
output
Reject CREATE TABLE/VIEW with duplicate column names
DFSchema checks column for uniqueness, but allows duplicate column names
when they are qualified differently. This is because DFSchema plays
central role during query planning as a identifier resolution scope.
Those checks in their current form should not be there, since they
prevent execution of queries with duplicate column aliases, which is
legal in SQL. But even with these checks present, they are not
sufficient to ensure CREATE TABLE/VIEW is well structured. Table or
view columns need to have unique names and there is no qualification
involved.
This commit adds necessary checks in CREATE TABLE/VIEW DDL structs,
ensuring that CREATE TABLE/VIEW logical plans are valid in that regard.
This PR includes
Encapsulate create table/view construction (to be able to add these checks)
Checks in create table/view construction to validate schema has no duplicate names
fixes CREATE TABLE succeeds when schema has duplicate names, resulting in a table that cannot be selected from #13487
extracted from Support duplicate column aliases in queries #13489