-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump minimum Spark version to 3.2.0 and improve AutoTuner unit tests for multiple Spark versions #1482
Conversation
Signed-off-by: Partho Sarthi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa !
Just wondering why bumping it to an old version while we can leap to a more recent job.
@@ -68,7 +68,10 @@ class AppInfoProviderMockTest(val maxInput: Double, | |||
*/ | |||
abstract class BaseAutoTunerSuite extends FunSuite with BeforeAndAfterEach with Logging { | |||
|
|||
val defaultSparkVersion = "3.1.1" | |||
// Default Spark version | |||
val defaultSparkVersion = "3.2.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just bump it to the most recent? something like 3.5.x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I have updated the default Spark version in the AutoTuner tests to 3.5.1.
This also brings up an idea (if needed in future): we could test for both the lower and upper bounds to ensure compatibility across the entire supported range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa. That's a good idea.
We can actually read the Spark version from runTime. Doing it dynamically based on the pom will achieve what you just suggested. Then the test will be done for each spark-version in the github workflow.
We can get the spark version using 2 methods:
- From pom file. We can do that by pulling the
build.spark.version
; or doing the easy way - by getting
ToolUtils.sparkRuntimeVersion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amahussein for the suggestion. I think that would be the right thing to do. In that case, we should drop the mvn profiles (<3.2.0). There is a similar issue NVIDIA/spark-rapids#11160 on the plugin side to drop tests for versions <3.2.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary from offline discussion:
- We should remove the Maven build profiles for Spark versions 3.1.1, 3.1.3, and 3.1.4
- Update the AutoTuner unit tests to dynamically test against multiple Spark versions.
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
@@ -25,7 +25,7 @@ jobs: | |||
strategy: | |||
matrix: | |||
java-version: [8, 11] | |||
spark-version: ['313', '324', '334', '350'] | |||
spark-version: ['324', '334', '350', '400'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this working? I thought that we cannot use spark-4.0.0 yet as it is based on scala 2.13.
- the maven repository has spark-4.0.0-previews only https://mvnrepository.com/artifact/org.apache.spark/spark-core
- the archive links also do not have scala2.12 binaries in the spark-4.0.0. https://archive.apache.org/dist/spark/spark-4.0.0-preview2/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added support Spark 4.0.0 in #537. This change adds it to github workflow. All tests are passing on Spark 4.0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. We added it long time ago.
Then we removed it from workflow when the scala2.12 was dropped.
I believe this success is bogu because it fetches an outdated snapshot of Spark4.0.0 that was compatibe with scala2.12. It is dated september 2023
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-core_2.12/
But if you look closely to scala2.13, you will see that up-to-date artifacts are January 2025
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.13/
We should remove that 400
from the github-workflow because it is not really testing Spark-400
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh. I see. Thank you for this catch. I have removed it from the workflow.
Signed-off-by: Partho Sarthi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa !
LGTME
Fixes #1480
This PR removes the Maven build profiles for Spark versions 3.1.1, 3.1.3, and 3.1.4, excludes Spark 3.1.1 from the GitHub workflow, and updates the AutoTuner unit tests to dynamically test against multiple Spark versions.
Detailed Changes
Build and Workflow Cleanup:
AutoTuner Unit Tests:
ToolUtils.sparkRuntimeVersion
.minPartitionSize
for Spark 3.2.0+ instead of the deprecatedminPartitionNum
.