Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump minimum Spark version to 3.2.0 and improve AutoTuner unit tests for multiple Spark versions #1482

Merged
merged 6 commits into from
Jan 2, 2025
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/mvn-verify-check.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2023-2024, NVIDIA CORPORATION.
# Copyright (c) 2023-2025, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -25,7 +25,7 @@ jobs:
strategy:
matrix:
java-version: [8, 11]
spark-version: ['313', '324', '334', '350']
spark-version: ['324', '334', '350', '400']
steps:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this working? I thought that we cannot use spark-4.0.0 yet as it is based on scala 2.13.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added support Spark 4.0.0 in #537. This change adds it to github workflow. All tests are passing on Spark 4.0.0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. We added it long time ago.
Then we removed it from workflow when the scala2.12 was dropped.
I believe this success is bogu because it fetches an outdated snapshot of Spark4.0.0 that was compatibe with scala2.12. It is dated september 2023
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-core_2.12/

But if you look closely to scala2.13, you will see that up-to-date artifacts are January 2025
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.13/

We should remove that 400 from the github-workflow because it is not really testing Spark-400

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. I see. Thank you for this catch. I have removed it from the workflow.

- uses: actions/checkout@v4

Expand Down
62 changes: 1 addition & 61 deletions core/pom.xml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright (c) 2021-2024, NVIDIA CORPORATION.
Copyright (c) 2021-2025, NVIDIA CORPORATION.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -70,66 +70,6 @@
</developer>
</developers>
<profiles>
<profile>
<id>release311</id>
<activation>
<property>
<name>buildver</name>
<value>311</value>
</property>
</activation>
<properties>
<buildver>311</buildver>
<spark.version>${spark311.version}</spark.version>
<delta.core.version>${delta10x.version}</delta.core.version>
<hadoop.version>3.3.6</hadoop.version>
</properties>
</profile>
<profile>
<id>release312</id>
<activation>
<property>
<name>buildver</name>
<value>312</value>
</property>
</activation>
<properties>
<buildver>312</buildver>
<spark.version>${spark312.version}</spark.version>
<delta.core.version>${delta10x.version}</delta.core.version>
<hadoop.version>3.3.6</hadoop.version>
</properties>
</profile>
<profile>
<id>release313</id>
<activation>
<property>
<name>buildver</name>
<value>313</value>
</property>
</activation>
<properties>
<buildver>313</buildver>
<spark.version>${spark313.version}</spark.version>
<delta.core.version>${delta10x.version}</delta.core.version>
<hadoop.version>3.3.6</hadoop.version>
</properties>
</profile>
<profile>
<id>release314</id>
<activation>
<property>
<name>buildver</name>
<value>314</value>
</property>
</activation>
<properties>
<buildver>314</buildver>
<spark.version>${spark314.version}</spark.version>
<delta.core.version>${delta10x.version}</delta.core.version>
<hadoop.version>3.3.6</hadoop.version>
</properties>
</profile>
<profile>
<id>release320</id>
<activation>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021-2024, NVIDIA CORPORATION.
* Copyright (c) 2021-2025, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -46,18 +46,17 @@ object ToolUtils extends Logging {

// Add more entries to this lookup table as necessary.
// There is no need to list all supported versions.
private val lookupVersions = Map(
"311" -> new ComparableVersion("3.1.1"), // default build version
private val lookupVersionsMap = Map(
"320" -> new ComparableVersion("3.2.0"), // introduced reusedExchange
"330" -> new ComparableVersion("3.3.0"), // used to check for memoryOverheadFactor
"331" -> new ComparableVersion("3.3.1"),
"340" -> new ComparableVersion("3.4.0"), // introduces jsonProtocolChanges
"350" -> new ComparableVersion("3.5.0") // introduces windowGroupLimit
"340" -> new ComparableVersion("3.4.0"), // introduces jsonProtocolChanges
"350" -> new ComparableVersion("3.5.0") // default build version, introduces windowGroupLimit
)

// Property to check the spark runtime version. We need this outside of test module as we
// extend the support runtime for different platforms such as Databricks.
lazy val sparkRuntimeVersion = {
lazy val sparkRuntimeVersion: String = {
org.apache.spark.SPARK_VERSION
}

Expand All @@ -78,8 +77,8 @@ object ToolUtils extends Logging {
compareVersions(refVersion, sparkRuntimeVersion) == 0
}

def compareToSparkVersion(currVersion: String, lookupVersion: String): Int = {
val lookupVersionObj = lookupVersions.get(lookupVersion).get
private def compareToSparkVersion(currVersion: String, lookupVersion: String): Int = {
val lookupVersionObj = lookupVersionsMap(lookupVersion)
val currVersionObj = new ComparableVersion(currVersion)
currVersionObj.compareTo(lookupVersionObj)
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
* Copyright (c) 2024-2025, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -27,6 +27,7 @@ import org.scalatest.{BeforeAndAfterEach, FunSuite}
import org.yaml.snakeyaml.{DumperOptions, Yaml}

import org.apache.spark.internal.Logging
import org.apache.spark.sql.rapids.tool.ToolUtils


case class DriverInfoProviderMockTest(unsupportedOps: Seq[DriverLogUnsupportedOperators])
Expand Down Expand Up @@ -68,7 +69,10 @@ class AppInfoProviderMockTest(val maxInput: Double,
*/
abstract class BaseAutoTunerSuite extends FunSuite with BeforeAndAfterEach with Logging {

val defaultSparkVersion = "3.1.1"
// Spark runtime version used for testing
def testSparkVersion: String = ToolUtils.sparkRuntimeVersion
// RapidsShuffleManager version used for testing
def testSmVersion: String = testSparkVersion.filterNot(_ == '.')

val defaultDataprocProps: mutable.Map[String, String] = {
mutable.LinkedHashMap[String, String](
Expand Down
Loading
Loading