handling transpiling partial ddls #1212

aman-db · 2024-11-15T17:15:55Z

closes #1185
added exception handling block for transpiling partial ddls in a file

github-actions · 2024-11-15T17:19:43Z

Coverage tests results

464 tests ±0 427 ✅ ±0 4s ⏱️ ±0s
6 suites ±0 37 💤 ±0
6 files ±0 0 ❌ ±0

Results for commit ee9ce83. ± Comparison against base commit a82a394.

♻️ This comment has been updated with latest results.

sundarshankar89 · 2024-11-18T04:16:59Z

src/databricks/labs/remorph/snow/sql_transpiler.py

        try:
-            transpiled_sql = transpile(sql, read=self.read_dialect, write=write_dialect, pretty=True, error_level=None)
+            parsed_expressions = parse(sql, read=self.read_dialect, error_level=ErrorLevel.WARN)


Why is the change in error level requires changing and is all the expression iteration needed?

sundarshankar89 · 2024-12-02T10:48:09Z

src/databricks/labs/remorph/helpers/file_utils.py

@@ -102,3 +102,20 @@ def refactor_hexadecimal_chars(input_string: str) -> str:
    for key, value in highlight.items():
        output_string = output_string.replace(key, value)
    return output_string
+
+
+def format_error_message(error_type: str, error_message: Exception, error_sql: str) -> str:


this is technically string utils can you move it to a new file string_utils.py

done.
as discussed moved functions def remove_bom and def refactor_hexadecimal_chars to string_utils.py also

sundarshankar89 · 2024-12-02T10:49:02Z

src/databricks/labs/remorph/snow/sql_transpiler.py


 class SqlglotEngine:
    def __init__(self, read_dialect: Dialect):
        self.read_dialect = read_dialect

+    def partial_transpile(


make this method private.

sundarshankar89 · 2024-12-02T10:54:05Z

src/databricks/labs/remorph/snow/sql_transpiler.py

+        transpiled_sql_statements = []
+        parsed_expressions, errors = self.safe_parse(statements=sql, read=self.read_dialect)
+        for expression in parsed_expressions:
+            if expression is not None:


Can you add a comment why we are expecting expressions to be never empty?

just adding it to check whether we are not transpiling an empty expression, but I think this is redundant . Removed if condition

sundarshankar89 · 2024-12-02T10:55:04Z

src/databricks/labs/remorph/snow/sql_transpiler.py

+            if expression is not None:
+                transpiled_sql = write_dialect.generate(expression, pretty=True)
+                transpiled_sql_statements.append(transpiled_sql)
+        for error in errors:


I will move this to a private function. just make a function call

Suggested change

for error in errors:

_handle_errors(errors)

sundarshankar89 · 2024-12-02T10:56:55Z

src/databricks/labs/remorph/snow/sql_transpiler.py

-            transpiled_sql = [""]
-            error_list.append(ParserError(file_name, refactor_hexadecimal_chars(str(e))))
-
+            transpiled_sql = transpile(


Add tests inside test_execute.py to test out the error types.

sundarshankar89 · 2024-12-02T10:57:45Z

src/databricks/labs/remorph/snow/sql_transpiler.py

+        # Need to define the separator in Class Tokenizer
+        for i, token in enumerate(tokens):
+            current_sql_chunk.append(token.text)
+            if token.token_type in {TokenType.SEMICOLON}:


We Should check if the are followed by space or "\n" as well? to be safe.

by adding this check we will be restricting input file should contain new line character or space after each and every query.

Suggested change

if token.token_type in {TokenType.SEMICOLON}:

keyword_dict = { TokenType.COMMAND, TokenType.CREATE, TokenType.ALTER, TokenType.GRANT, TokenType.INSERT, TokenType.With}

Can you create a dictionary and check the next token is not in this list. it is not exhaustive, but we can keep adding if we find bugs.

sundarshankar89 · 2024-12-02T10:58:53Z

tests/unit/snow/test_sql_transpiler.py

@@ -25,7 +25,7 @@ def test_transpile_exception(transpiler, write_dialect):
    transpiler_result = transpiler.transpile(
        write_dialect, "SELECT TRY_TO_NUMBER(COLUMN, $99.99, 27) FROM table", "file.sql", []
    )
-    assert transpiler_result.transpiled_sql[0] == ""
+    assert len(transpiler_result.transpiled_sql[0])


what are you asserting here?

done. changed assertion statement

sundarshankar89 · 2024-12-02T10:59:19Z

tests/unit/snow/test_sql_transpiler.py

@@ -64,8 +64,7 @@ def test_parse_invalid_query(transpiler):

 def test_tokenizer_exception(transpiler, write_dialect):
    transpiler_result = transpiler.transpile(write_dialect, "1SELECT ~v\ud83d' ", "file.sql", [])
-
-    assert transpiler_result.transpiled_sql == [""]
+    assert len(transpiler_result.transpiled_sql[0])


Same as above.

done. changed assertion statement

sundarshankar89 · 2024-12-03T08:31:40Z

tests/unit/transpiler/test_execute.py

@@ -33,7 +33,8 @@ def safe_remove_file(file_path: Path):

 def write_data_to_file(path: Path, content: str):
    with path.open("w") as writable:
-        writable.write(content)
+        # writable.write(content)
+        writable.write(content.encode("utf-8", "ignore").decode("utf-8"))


why is this necessary?

Can you add a comment?

I have added this because I was not able to write the query - 1SELECT ~v\ud83d' to the file for TOKEN ERROR scenario

sundarshankar89 · 2024-12-03T08:40:55Z

src/databricks/labs/remorph/snow/sql_transpiler.py

+        # Need to define the separator in Class Tokenizer
+        for i, token in enumerate(tokens):
+            current_sql_chunk.append(token.text)
+            if token.token_type in {TokenType.SEMICOLON}:


Suggested change

if token.token_type in {TokenType.SEMICOLON}:

keyword_dict = { TokenType.COMMAND, TokenType.CREATE, TokenType.ALTER, TokenType.GRANT, TokenType.INSERT, TokenType.With}

Can you create a dictionary and check the next token is not in this list. it is not exhaustive, but we can keep adding if we find bugs.

adding exception handling for transpile partial ddls

35667d0

aman-db added the bug Something isn't working label Nov 15, 2024

formating changes

59601e6

aman-db requested review from bishwajit-db and sundarshankar89 November 15, 2024 17:29

sundarshankar89 requested changes Nov 18, 2024

View reviewed changes

sundarshankar89 and others added 3 commits November 19, 2024 09:18

Merge branch 'main' into bug/transpile_partial_ddls

52597e3

adding code for partial transpile

0dfc8d2

Merge branch 'main' into bug/transpile_partial_ddls

149882c

aman-db marked this pull request as ready for review November 29, 2024 13:19

aman-db requested a review from a team as a code owner November 29, 2024 13:19

aman-db requested a review from sundarshankar89 November 29, 2024 14:29

jimidle assigned aman-db Nov 29, 2024

jimidle added the transpile/legacy related to prototype implementation in sqlglot label Nov 29, 2024

sundarshankar89 requested changes Dec 2, 2024

View reviewed changes

aman-db and others added 2 commits December 2, 2024 17:33

Merge branch 'main' into bug/transpile_partial_ddls

69acba3

adding string_utils and code refactor

f1b2a0c

aman-db marked this pull request as draft December 2, 2024 18:21

aman-db and others added 2 commits December 3, 2024 12:48

Merge branch 'main' into bug/transpile_partial_ddls

7c62979

adding test cases

ee9ce83

aman-db requested a review from sundarshankar89 December 3, 2024 08:18

aman-db marked this pull request as ready for review December 3, 2024 08:18

sundarshankar89 requested changes Dec 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handling transpiling partial ddls #1212

handling transpiling partial ddls #1212

aman-db commented Nov 15, 2024

github-actions bot commented Nov 15, 2024 •

edited

Loading

sundarshankar89 Nov 18, 2024

sundarshankar89 Dec 2, 2024

aman-db Dec 2, 2024

sundarshankar89 Dec 2, 2024

aman-db Dec 2, 2024

sundarshankar89 Dec 2, 2024

aman-db Dec 2, 2024

sundarshankar89 Dec 2, 2024

aman-db Dec 2, 2024

sundarshankar89 Dec 2, 2024

sundarshankar89 Dec 2, 2024

aman-db Dec 2, 2024

sundarshankar89 Dec 3, 2024

sundarshankar89 Dec 2, 2024

aman-db Dec 2, 2024

sundarshankar89 Dec 2, 2024

aman-db Dec 2, 2024

sundarshankar89 Dec 3, 2024

sundarshankar89 Dec 3, 2024

aman-db Dec 3, 2024 •

edited

Loading

sundarshankar89 Dec 3, 2024

	if token.token_type in {TokenType.SEMICOLON}:
	keyword_dict = { TokenType.COMMAND, TokenType.CREATE, TokenType.ALTER, TokenType.GRANT, TokenType.INSERT, TokenType.With}

handling transpiling partial ddls #1212

Are you sure you want to change the base?

handling transpiling partial ddls #1212

Conversation

aman-db commented Nov 15, 2024

github-actions bot commented Nov 15, 2024 • edited Loading

Coverage tests results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aman-db Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Nov 15, 2024 •

edited

Loading

aman-db Dec 3, 2024 •

edited

Loading