Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump minimum Spark version to 3.2.0 and improve AutoTuner unit tests for multiple Spark versions #1482

Merged
merged 6 commits into from
Jan 2, 2025

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented Dec 31, 2024

Fixes #1480

This PR removes the Maven build profiles for Spark versions 3.1.1, 3.1.3, and 3.1.4, excludes Spark 3.1.1 from the GitHub workflow, and updates the AutoTuner unit tests to dynamically test against multiple Spark versions.

Detailed Changes

  1. Build and Workflow Cleanup:

    • Removed outdated Maven build profiles for Spark 3.1.1, 3.1.3, and 3.1.4.
    • Excluded Spark 3.1.1 from the GitHub workflow.
  2. AutoTuner Unit Tests:

    • Updated to dynamically determine the Spark runtime version using ToolUtils.sparkRuntimeVersion.
    • Enables testing across different Spark versions in GitHub workflows without hardcoding specific versions.
    • Modified tests to recommend minPartitionSize for Spark 3.2.0+ instead of the deprecated minPartitionNum.

@parthosa parthosa added bug Something isn't working core_tools Scope the core module (scala) labels Dec 31, 2024
@parthosa parthosa requested a review from amahussein December 31, 2024 21:27
@parthosa parthosa self-assigned this Dec 31, 2024
@parthosa parthosa marked this pull request as ready for review December 31, 2024 21:46
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa !
Just wondering why bumping it to an old version while we can leap to a more recent job.

@@ -68,7 +68,10 @@ class AppInfoProviderMockTest(val maxInput: Double,
*/
abstract class BaseAutoTunerSuite extends FunSuite with BeforeAndAfterEach with Logging {

val defaultSparkVersion = "3.1.1"
// Default Spark version
val defaultSparkVersion = "3.2.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just bump it to the most recent? something like 3.5.x?

Copy link
Collaborator Author

@parthosa parthosa Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I have updated the default Spark version in the AutoTuner tests to 3.5.1.

This also brings up an idea (if needed in future): we could test for both the lower and upper bounds to ensure compatibility across the entire supported range.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa. That's a good idea.
We can actually read the Spark version from runTime. Doing it dynamically based on the pom will achieve what you just suggested. Then the test will be done for each spark-version in the github workflow.
We can get the spark version using 2 methods:

  • From pom file. We can do that by pulling the build.spark.version; or doing the easy way
  • by getting ToolUtils.sparkRuntimeVersion

Copy link
Collaborator Author

@parthosa parthosa Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amahussein for the suggestion. I think that would be the right thing to do. In that case, we should drop the mvn profiles (<3.2.0). There is a similar issue NVIDIA/spark-rapids#11160 on the plugin side to drop tests for versions <3.2.0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary from offline discussion:

  • We should remove the Maven build profiles for Spark versions 3.1.1, 3.1.3, and 3.1.4
  • Update the AutoTuner unit tests to dynamically test against multiple Spark versions.

@parthosa parthosa changed the title Update AutoTuner unit tests to minimum supported Spark and RapidsShuffleManager versions Bump minimum Spark version to 3.2.0 and improve AutoTuner unit tests for multiple Spark versions Jan 2, 2025
@parthosa parthosa requested a review from amahussein January 2, 2025 19:26
@@ -25,7 +25,7 @@ jobs:
strategy:
matrix:
java-version: [8, 11]
spark-version: ['313', '324', '334', '350']
spark-version: ['324', '334', '350', '400']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this working? I thought that we cannot use spark-4.0.0 yet as it is based on scala 2.13.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added support Spark 4.0.0 in #537. This change adds it to github workflow. All tests are passing on Spark 4.0.0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. We added it long time ago.
Then we removed it from workflow when the scala2.12 was dropped.
I believe this success is bogu because it fetches an outdated snapshot of Spark4.0.0 that was compatibe with scala2.12. It is dated september 2023
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-core_2.12/

But if you look closely to scala2.13, you will see that up-to-date artifacts are January 2025
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.13/

We should remove that 400 from the github-workflow because it is not really testing Spark-400

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. I see. Thank you for this catch. I have removed it from the workflow.

@parthosa parthosa requested a review from amahussein January 2, 2025 21:04
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa !
LGTME

@parthosa parthosa merged commit 1857fe2 into NVIDIA:dev Jan 2, 2025
13 checks passed
@parthosa parthosa deleted the spark-rapids-tools-1480 branch January 2, 2025 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Bump minimum supported Spark runtime version to 3.2.0 and update unit tests
2 participants