Skip to content

Conversation

parthosa
Copy link
Collaborator

Fixes #1873

Problem Statement

Currently, AutoTuner's target cluster only supports "enforced" section for overriding property values. In some cases, users need additional flexibility to:

  1. Preserve Source Values: Maintain certain Spark property values exactly as they were in the original CPU job
  2. Exclude Properties: Exclude certain Spark properties from recommendations due to environment constraints or incompatibilities

Changes

This PR introduces two new sections to the target cluster:

1. "preserve" Section

  • Type: List of property names
  • Behavior: Uses values from the source application and treats them as enforced
  • Properties are included in final recommendations with their original source values
  • Generates comment: '<property>' was preserved from source application properties as specified in target cluster.

2. "exclude" Section

  • Type: List of property names
  • Behavior: Excludes properties from tuning recommendations
  • Properties are removed from the tuning table and will not appear in recommendations
  • Generates comment: '<property>' was excluded from tuning recommendations as specified in target cluster.

Example Target Cluster (also included in the PR).

sparkProperties:
  enforced:
    spark.rapids.sql.concurrentGpuTasks: 2
  preserve:
    - spark.sql.shuffle.partitions
  exclude:
    - spark.rapids.shuffle.multiThreaded.reader.threads
    - spark.rapids.shuffle.multiThreaded.writer.threads
    - spark.kryo.registrator
    - spark.rapids.sql.multiThreadedRead.numThreads

Key Changes

Core Implementation

  • Platform.scala: Added isPropertyPreserved() and isPropertyExcluded() methods
  • AutoTuner.scala:
    • Enhanced finalTuningTable creation to handle preserve/exclude lists
    • Updated recommendation logic to preserve source values for preserved properties
    • Added comment generation for preserved/excluded properties

Configuration Management

  • TargetClusterProps.scala:
    • Extended SparkProperties class with preserve and exclude fields
    • Added validation to prevent overlapping keys between preserve, exclude, and enforced properties
    • Updated documentation with examples

Testing

  • QualificationAutoTunerSuite.scala:
    • Handling of preserved properties not found in source
    • Integration with enforced properties
    • Validation of overlapping keys (throws IllegalArgumentException)

@parthosa parthosa self-assigned this Aug 26, 2025
@github-actions github-actions bot added the core_tools Scope the core module (scala) label Aug 26, 2025
@parthosa parthosa requested a review from Copilot August 26, 2025 23:07
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds "preserve" and "exclude" sections to the target cluster configuration for AutoTuner, providing greater flexibility in property management beyond the existing "enforced" section. The preserve list maintains original CPU job property values, while the exclude list removes properties from tuning recommendations.

Key changes:

  • Extended SparkProperties class to support preserve and exclude lists with validation
  • Enhanced AutoTuner logic to handle preserved properties and exclude unwanted ones
  • Added comprehensive test coverage for overlapping properties and edge cases

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
QualificationAutoTunerSuite.scala Added comprehensive test cases for preserve/exclude functionality and validation
ToolTestUtils.scala Extended utility methods to support preserve/exclude parameters in test setup
TargetClusterProps.scala Extended SparkProperties with preserve/exclude fields and overlap validation
AutoTuner.scala Core implementation for handling preserved/excluded properties in recommendations
Platform.scala Added helper methods to check if properties are preserved or excluded
tuningTable.yaml Added new tuning definition for incompatible date formats property
targetCluster07.yaml Added example configuration demonstrating preserve/exclude usage

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@parthosa parthosa marked this pull request as ready for review August 27, 2025 17:44
@parthosa parthosa requested a review from GaryShen2008 August 27, 2025 17:44
Copy link
Collaborator

@sayedbilalbari sayedbilalbari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa ,overall looks good ! Just some questions regarding the overall flow

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this issue for sometime where the union of the recommendations show properties that are provided by the CSP. #1257
Two main points I see here:

  • Just curious if this implementation allows us later to define an automatic excluded list for each CSP?
  • excluded/preserve flags: we may want to consider to allow properties that are not necessarily handled by the AutoTuner. For example, does the property name be defined in the tuningTable.yaml in order to be considered a valid excluded/preserve entry?

@parthosa
Copy link
Collaborator Author

@amahussein

Just curious if this implementation allows us later to define an automatic excluded list for each CSP?

Yes. There is a CSP/Platform specific list recommendationsToExclude which contains properties that must be skipped.

excluded/preserve flags: we may want to consider to allow properties that are not necessarily handled by the AutoTuner. For example, does the property name be defined in the tuningTable.yaml in order to be considered a valid excluded/preserve entry?

This PR supports this. I have added E2E test that covers the following:

val excludeProperties = List(
  // Exclude a property that is set in event log
  "spark.master",
  // Exclude a property that is recommended by AutoTuner
  "spark.rapids.sql.concurrentGpuTasks"
)
val preserveProperties = List(
  // Preserve a property that is not present in event log
  "spark.task.resource.gpu.amount",
  // Preserve a property that is present and recommended by AutoTuner
  "spark.executor.memory",
  // Preserve a property that is present but not recommended by AutoTuner
  "spark.dataproc.engine"
)
val enforcedSparkProperties = Map(
  "spark.sql.shuffle.partitions" -> "800"
)

The current implementation leverages:

  • skippedRecommendations - To store a combined list of properties that must be skipped from Tools specific, Platform specific or user-specified in exclude section)
  • limitedLogicList - To store a combined list of properties that whose calculation is disabled and only values from source will be used.
    • However, if the source properties do not contain the value, AutoTuner will inform the user as comment and continue its recommendation.

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Thanks @parthosa

Copy link
Collaborator

@sayedbilalbari sayedbilalbari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa . LGTM

@parthosa parthosa merged commit 3196292 into NVIDIA:dev Sep 3, 2025
14 checks passed
@parthosa parthosa deleted the rapids-tools-1873 branch September 3, 2025 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autotuner core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add "preserve" and "exclude" sections to target cluster
3 participants