Skip to content

Conversation

srielau
Copy link
Contributor

@srielau srielau commented Aug 29, 2025

What changes were proposed in this pull request?

This PR significantly reworks the existing EXECUTE IMMEDIATE SQL statement implementation in to improve separation of concerns, robustness, and functionality:

  1. Architecture Redesign:

    • Refactored from direct execution to a nested invocation model using sparkSession.sql()
    • Improved separation between analysis-time validation and execution-time processing
    • Enhanced isolation between outer and inner query contexts
  2. Enhanced Query String Support:

    • NEW: Support for constant expressions as SQL statement arguments (previously only string literals)
  3. Improved Parameter Handling:

    • Redesigned parameter processing with unified parameter arrays
    • Enhanced validation for mixing positional and named parameters
    • Better error messages with UNSUPPORTED_EXPR_FOR_PARAMETER for invalid expressions
  4. Robust SQL Scripting Integration:

    • NEW: Complete isolation of local variables from nested EXECUTE IMMEDIATE calls
    • Preserved session variable functionality within EXECUTE IMMEDIATE
    • NEW: Proper command vs. query result distinction (commands don't produce result sets)
  5. Enhanced Error Handling:

    • Removed unnecessary exception wrapping for cleaner error propagation
    • NEW: Multi-level context error reporting showing both outer and inner query context
    • Comprehensive validation with appropriate error classes
  6. Code Quality Improvements:

    • Eliminated code duplication in parameter validation logic
    • Consolidated imports and cleaned up unused code
    • Improved method naming and documentation

Open issues:
The code testing whether a query uses named or positional parameter markers is not robust yet and needs more work.

Why are the changes needed?

The existing implementation has been brittle and could not handle transitive closure (e.g. nested EXECUTE IMMEDIATE and SQL Scripting in EXECUTE IMMEDIATE.
There is also a follow on PR dealing with improved parameter substitution which is being blocked from supporting EXECUTE IMMEDIATE until this PR is delivered.
Lastly with this change we lay the foundation to support nesting/chaining of errors contexts to SQL level stacks.

Does this PR introduce any user-facing change?

Enhanced Functionality (Backward Compatible):

-- NEW: Expression concatenation
EXECUTE IMMEDIATE 'SELECT * FROM ' || table_name || ' WHERE active = true';

-- EXISTING: All previous syntax continues to work
EXECUTE IMMEDIATE 'SELECT * FROM table_name';
EXECUTE IMMEDIATE 'SELECT * FROM table_name WHERE id = ?' USING 123;
EXECUTE IMMEDIATE 'SELECT name FROM users WHERE id = ?' INTO user_name USING 123;

A couple of error conditions now return more general codes due to improved orthogonality.

All existing valid EXECUTE IMMEDIATE statements continue to work as before.

How was this patch tested?

  1. Regression Testing:

    • All existing EXECUTE IMMEDIATE tests continue to pass
    • Verified backward compatibility with existing syntax and behavior
  2. Enhanced Test Coverage:

    • NEW: Tests for variable references and expression concatenation in query strings
    • NEW: Tests for proper local variable isolation in SQL scripting contexts
    • NEW: Tests for command vs. query result handling
    • EXPANDED: Error condition testing with new validation logic
  3. Integration Testing:

    • SqlScriptingExecutionSuite tests for SQL scripting integration
    • Golden file tests (execute-immediate.sql) for analyzer and execution results
    • Cross-feature testing with variables, parameters, and nested contexts
  4. Error Handling Validation:

    • Comprehensive testing of all error classes and messages
    • Validation that exceptions propagate naturally without wrapping
    • Multi-level error context testing

The reworked implementation passes all existing tests plus extensive new test coverage, ensuring both backward compatibility and enhanced functionality.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude 3.5 Sonnet (Anthropic) - Used for code review, refactoring suggestions, and implementation guidance. All core architectural decisions, SQL semantics, and implementation logic were designed and developed by human developers.

@dongjoon-hyun dongjoon-hyun marked this pull request as draft August 30, 2025 20:18
@srielau srielau force-pushed the rework-execute-immediate branch from 2d1615a to 65b31dd Compare August 30, 2025 23:54
@srielau srielau changed the title [WIP] Rework execute immediate [SPARK-53444] Rework execute immediate Aug 31, 2025
Comment on lines 118 to 125
// Check for duplicate variable names (same logic as ResolveSetVariable)
val dups = finalTargetVars.groupBy(_.identifier).filter(kv => kv._2.length > 1)
if (dups.nonEmpty) {
throw new AnalysisException(
errorClass = "DUPLICATE_ASSIGNMENTS",
messageParameters = Map("nameList" ->
dups.keys.map(key => toSQLId(key.name())).mkString(", ")))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave this check to the generated SetVariable? If so we would avoid this code duplication

val plan = tracker.measurePhase(QueryPlanningTracker.PARSING) {
val parsedPlan = sessionState.sqlParser.parsePlan(sqlText)
if (args.nonEmpty) {
if (parsedPlan.isInstanceOf[CompoundBody]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this is fixed. When we call this method, we need to check whether positional parameters are supplied to a SQL script. Here we only check args.nonEmpty, which will be true whether we have positional or named paremeters. In effect, SQL scripts will regress to no longer support named parameters, as this error will be thrown every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated. Please check again. Sorry about that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants