Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handling transpiling partial ddls #1212

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

aman-db
Copy link
Contributor

@aman-db aman-db commented Nov 15, 2024

closes #1185
added exception handling block for transpiling partial ddls in a file

@aman-db aman-db added the bug Something isn't working label Nov 15, 2024
Copy link

github-actions bot commented Nov 15, 2024

Coverage tests results

464 tests  ±0   427 ✅ ±0   4s ⏱️ ±0s
  6 suites ±0    37 💤 ±0 
  6 files   ±0     0 ❌ ±0 

Results for commit ee9ce83. ± Comparison against base commit a82a394.

♻️ This comment has been updated with latest results.

try:
transpiled_sql = transpile(sql, read=self.read_dialect, write=write_dialect, pretty=True, error_level=None)
parsed_expressions = parse(sql, read=self.read_dialect, error_level=ErrorLevel.WARN)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the change in error level requires changing and is all the expression iteration needed?

@aman-db aman-db marked this pull request as ready for review November 29, 2024 13:19
@aman-db aman-db requested a review from a team as a code owner November 29, 2024 13:19
@jimidle jimidle added the transpile/legacy related to prototype implementation in sqlglot label Nov 29, 2024
@@ -102,3 +102,20 @@ def refactor_hexadecimal_chars(input_string: str) -> str:
for key, value in highlight.items():
output_string = output_string.replace(key, value)
return output_string


def format_error_message(error_type: str, error_message: Exception, error_sql: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is technically string utils can you move it to a new file string_utils.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.
as discussed moved functions def remove_bom and def refactor_hexadecimal_chars to string_utils.py also


class SqlglotEngine:
def __init__(self, read_dialect: Dialect):
self.read_dialect = read_dialect

def partial_transpile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this method private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

transpiled_sql_statements = []
parsed_expressions, errors = self.safe_parse(statements=sql, read=self.read_dialect)
for expression in parsed_expressions:
if expression is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment why we are expecting expressions to be never empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just adding it to check whether we are not transpiling an empty expression, but I think this is redundant . Removed if condition

if expression is not None:
transpiled_sql = write_dialect.generate(expression, pretty=True)
transpiled_sql_statements.append(transpiled_sql)
for error in errors:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move this to a private function. just make a function call

Suggested change
for error in errors:
_handle_errors(errors)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

transpiled_sql = [""]
error_list.append(ParserError(file_name, refactor_hexadecimal_chars(str(e))))

transpiled_sql = transpile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests inside test_execute.py to test out the error types.

# Need to define the separator in Class Tokenizer
for i, token in enumerate(tokens):
current_sql_chunk.append(token.text)
if token.token_type in {TokenType.SEMICOLON}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We Should check if the are followed by space or "\n" as well? to be safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by adding this check we will be restricting input file should contain new line character or space after each and every query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if token.token_type in {TokenType.SEMICOLON}:
keyword_dict = { TokenType.COMMAND, TokenType.CREATE, TokenType.ALTER, TokenType.GRANT, TokenType.INSERT, TokenType.With}

Can you create a dictionary and check the next token is not in this list. it is not exhaustive, but we can keep adding if we find bugs.

@@ -25,7 +25,7 @@ def test_transpile_exception(transpiler, write_dialect):
transpiler_result = transpiler.transpile(
write_dialect, "SELECT TRY_TO_NUMBER(COLUMN, $99.99, 27) FROM table", "file.sql", []
)
assert transpiler_result.transpiled_sql[0] == ""
assert len(transpiler_result.transpiled_sql[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are you asserting here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. changed assertion statement

@@ -64,8 +64,7 @@ def test_parse_invalid_query(transpiler):

def test_tokenizer_exception(transpiler, write_dialect):
transpiler_result = transpiler.transpile(write_dialect, "1SELECT ~v\ud83d' ", "file.sql", [])

assert transpiler_result.transpiled_sql == [""]
assert len(transpiler_result.transpiled_sql[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. changed assertion statement

@aman-db aman-db marked this pull request as draft December 2, 2024 18:21
@aman-db aman-db marked this pull request as ready for review December 3, 2024 08:18
@@ -33,7 +33,8 @@ def safe_remove_file(file_path: Path):

def write_data_to_file(path: Path, content: str):
with path.open("w") as writable:
writable.write(content)
# writable.write(content)
writable.write(content.encode("utf-8", "ignore").decode("utf-8"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment?

Copy link
Contributor Author

@aman-db aman-db Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this because I was not able to write the query - 1SELECT ~v\ud83d' to the file for TOKEN ERROR scenario

# Need to define the separator in Class Tokenizer
for i, token in enumerate(tokens):
current_sql_chunk.append(token.text)
if token.token_type in {TokenType.SEMICOLON}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if token.token_type in {TokenType.SEMICOLON}:
keyword_dict = { TokenType.COMMAND, TokenType.CREATE, TokenType.ALTER, TokenType.GRANT, TokenType.INSERT, TokenType.With}

Can you create a dictionary and check the next token is not in this list. it is not exhaustive, but we can keep adding if we find bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working transpile/legacy related to prototype implementation in sqlglot
Projects
None yet
3 participants