Add Arguments for Distributed Mode in Qualification Tool CLI #1428
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR add arguments to enable distributed tools for qualification tool.
Changes
RapidsJob
: Introduced two subclasses—RapidsDistributedJob
andRapidsLocalJob
and a concrete class for theOnPrem
platform.JarCmdArgs
class to encapsulate all arguments needed to construct the JAR command.DistributedToolsConfig
class, allowing configurations for distributed tools (like Spark properties) to be specified via the--tools_config_file
option.Example
CMD:
Sample Config File:
Details:
user_tools/src/spark_rapids_pytools/cloud_api/onprem.py
: Added a new classOnPremDistributedRapidsJob
and a methodcreate_distributed_submission_job
to support distributed RAPIDS jobs. [1] [2]user_tools/src/spark_rapids_pytools/rapids/rapids_job.py
: IntroducedRapidsDistributedJob
class and updated methods to handle distributed tool configurations. [1] [2] [3] [4]user_tools/src/spark_rapids_pytools/rapids/rapids_tool.py
: Added methods to get distributed tools configurations and submit distributed jobs. [1] [2]Enhancements to argument processing:
user_tools/src/spark_rapids_pytools/rapids/qualification.py
: Added methods to process distributed tools arguments. [1] [2]user_tools/src/spark_rapids_tools/cmdli/argprocessor.py
: UpdatedQualifyUserArgModel
andbuild_tools_args
to includedistributed_tools_enabled
. [1] [2]Platform class updates:
user_tools/src/spark_rapids_pytools/cloud_api/databricks_aws.py
,databricks_azure.py
,dataproc.py
,dataproc_gke.py
,emr.py
: Disabled pylint warnings for abstract methods. [1] [2] [3] [4] [5]Other improvements:
user_tools/src/spark_rapids_pytools/rapids/qualification.py
: Added a check to ensure the DataFrame is not empty before accessing it.user_tools/src/spark_rapids_tools/cmdli/tools_cli.py
: Added a new parameterdistributed
to thequalification
function.