-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[WIP][SPARK-53455][CONNECT] Add CloneSession
RPC
#52200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
CloneSession
RPCCloneSession
RPC
optional string client_type = 3; | ||
|
||
// (Optional) | ||
// The session_id for the new cloned session. If not provided, a new UUID will be generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the server allowed to return a different session id if you provide this id?
* Create a clone of this Spark Connect session on the server side with a custom session ID. | ||
* The server-side session is cloned with all its current state (SQL configurations, temporary | ||
* views, registered functions, catalog state) copied over to a new independent session with | ||
* the specified session ID. The returned cloned session will remain isolated from this session. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'remain' is a bit weird since the session is new.
@@ -1139,4 +1187,8 @@ service SparkConnectService { | |||
|
|||
// FetchErrorDetails retrieves the matched exception with details based on a provided error id. | |||
rpc FetchErrorDetails(FetchErrorDetailsRequest) returns (FetchErrorDetailsResponse) {} | |||
|
|||
// Clone a session. Creates a new session that shares the same configuration and state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this as descriptive as the Scala API doc?
* @note This creates a new server-side session with the specified session ID while preserving | ||
* the current session's configuration and state. | ||
*/ | ||
def cloneSession(sessionId: String): SparkSession = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this part of the public API?
Please annotate this with @DeveloperAPI
.
|
||
// Next ID: 5 | ||
message CloneSessionResponse { | ||
// Session id of the original session that was cloned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the old ids?
// configuration, catalog, session state, temporary views, and registered functions | ||
val clonedSparkSession = sourceSessionHolder.session.cloneSession() | ||
|
||
val newHolder = SessionHolder(newKey.userId, newKey.sessionId, clonedSparkSession) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we clone more state here? For example MLCache, or DataFrameCache?
assert(ex.getMessage.contains("Session not found")) | ||
} | ||
|
||
test("successful clone session creates new session") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we test session independence?
}, | ||
"TARGET_SESSION_ID_FORMAT" : { | ||
"message" : [ | ||
"Target session ID <targetSessionId> for clone operation must be an UUID string of the format '00112233-4455-6677-8899-aabbccddeeff'." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a different error for this as well?
What changes were proposed in this pull request?
Adds a new RPC
CloneSession
to theSparkConnectService
.Why are the changes needed?
Spark Connect introduced the concept of resource isolation (via
ArtifactManager
, which has been ported to classic Spark) and thus, jars/pyfiles/artifacts added to each session are isolated from other sessions.A slight rough edge is that if a user wishes to fork the state of a session but maintain independence, the only possible way is to create a new session and reupload/reinit all base jars/artifacts/pyfiles, etc.
Support for cloning through the API helps address the rough edge while maintaining all the benefits of session resource isolation.
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
New individual unit tests along with new test suites.
Was this patch authored or co-authored using generative AI tooling?
Co-authored with assistance from Claude Code.