-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handling transpiling partial ddls #1212
base: main
Are you sure you want to change the base?
Conversation
try: | ||
transpiled_sql = transpile(sql, read=self.read_dialect, write=write_dialect, pretty=True, error_level=None) | ||
parsed_expressions = parse(sql, read=self.read_dialect, error_level=ErrorLevel.WARN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the change in error level requires changing and is all the expression iteration needed?
@@ -102,3 +102,20 @@ def refactor_hexadecimal_chars(input_string: str) -> str: | |||
for key, value in highlight.items(): | |||
output_string = output_string.replace(key, value) | |||
return output_string | |||
|
|||
|
|||
def format_error_message(error_type: str, error_message: Exception, error_sql: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is technically string utils can you move it to a new file string_utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
as discussed moved functions def remove_bom
and def refactor_hexadecimal_chars
to string_utils.py also
|
||
class SqlglotEngine: | ||
def __init__(self, read_dialect: Dialect): | ||
self.read_dialect = read_dialect | ||
|
||
def partial_transpile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this method private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
transpiled_sql_statements = [] | ||
parsed_expressions, errors = self.safe_parse(statements=sql, read=self.read_dialect) | ||
for expression in parsed_expressions: | ||
if expression is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment why we are expecting expressions to be never empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just adding it to check whether we are not transpiling an empty expression, but I think this is redundant . Removed if condition
if expression is not None: | ||
transpiled_sql = write_dialect.generate(expression, pretty=True) | ||
transpiled_sql_statements.append(transpiled_sql) | ||
for error in errors: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will move this to a private function. just make a function call
for error in errors: | |
_handle_errors(errors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
transpiled_sql = [""] | ||
error_list.append(ParserError(file_name, refactor_hexadecimal_chars(str(e)))) | ||
|
||
transpiled_sql = transpile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add tests inside test_execute.py
to test out the error types.
# Need to define the separator in Class Tokenizer | ||
for i, token in enumerate(tokens): | ||
current_sql_chunk.append(token.text) | ||
if token.token_type in {TokenType.SEMICOLON}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We Should check if the are followed by space or "\n" as well? to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by adding this check we will be restricting input file should contain new line character or space after each and every query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if token.token_type in {TokenType.SEMICOLON}: | |
keyword_dict = { TokenType.COMMAND, TokenType.CREATE, TokenType.ALTER, TokenType.GRANT, TokenType.INSERT, TokenType.With} |
Can you create a dictionary and check the next token is not in this list. it is not exhaustive, but we can keep adding if we find bugs.
@@ -25,7 +25,7 @@ def test_transpile_exception(transpiler, write_dialect): | |||
transpiler_result = transpiler.transpile( | |||
write_dialect, "SELECT TRY_TO_NUMBER(COLUMN, $99.99, 27) FROM table", "file.sql", [] | |||
) | |||
assert transpiler_result.transpiled_sql[0] == "" | |||
assert len(transpiler_result.transpiled_sql[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are you asserting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. changed assertion statement
@@ -64,8 +64,7 @@ def test_parse_invalid_query(transpiler): | |||
|
|||
def test_tokenizer_exception(transpiler, write_dialect): | |||
transpiler_result = transpiler.transpile(write_dialect, "1SELECT ~v\ud83d' ", "file.sql", []) | |||
|
|||
assert transpiler_result.transpiled_sql == [""] | |||
assert len(transpiler_result.transpiled_sql[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. changed assertion statement
@@ -33,7 +33,8 @@ def safe_remove_file(file_path: Path): | |||
|
|||
def write_data_to_file(path: Path, content: str): | |||
with path.open("w") as writable: | |||
writable.write(content) | |||
# writable.write(content) | |||
writable.write(content.encode("utf-8", "ignore").decode("utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added this because I was not able to write the query - 1SELECT ~v\ud83d'
to the file for TOKEN ERROR
scenario
# Need to define the separator in Class Tokenizer | ||
for i, token in enumerate(tokens): | ||
current_sql_chunk.append(token.text) | ||
if token.token_type in {TokenType.SEMICOLON}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if token.token_type in {TokenType.SEMICOLON}: | |
keyword_dict = { TokenType.COMMAND, TokenType.CREATE, TokenType.ALTER, TokenType.GRANT, TokenType.INSERT, TokenType.With} |
Can you create a dictionary and check the next token is not in this list. it is not exhaustive, but we can keep adding if we find bugs.
closes #1185
added exception handling block for transpiling partial ddls in a file