-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add qualification support for Photon jobs in the Python Tool #1409
Add qualification support for Photon jobs in the Python Tool #1409
Conversation
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa !
Just for sake of confirmation:
- Is there another followup PR to change the QualX module to read the app_meta.json to decide whether this app is photon or not? In that case the PR description is not accurate because it gives impression that it adds support e-2-e.
- I am concerned about how we can troubleshoot and validate app_meta.json. the wrapper reads the autotuner's output and copy some of the fields to that file in the upper level. With this PR, we are adding a new field derived from python logic. Later, we will hit a question "Where does each field come from?" (this becomes even more challenging if fields might be overridden by Python wrapper). CC: @tgravescs
upperBound: 1000000.0 | ||
- columnName: 'Unsupported Operators Stage Duration Percent' | ||
lowerBound: 0.0 | ||
upperBound: 25.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs some thinking on the impact of design.
This introduces a platform configuration inside the tool's conf. On the other hand, we do have a configuration file per platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking since all platforms would have the same value for spark
case, we would be duplicating the configuration in each platform. In future, if we have different values for different platform, we could put these in separate platform config files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a valid point that there are some common settings between platforms.
In future, we can improve our config structure to have common parent or something shared between all the platforms.
The other way around of specifying the platfrom behavioor inside the tools config will trigger a design inconsistency moving fwd; especially with every contributor's preference on where a newly added config should go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is okay for now to keep that in order to unblock the photon feature.
Later, we can revisit this.
From offline discussions with @amahussein and @leewyang, moving the detection of runtime (Spark/Photon/Velox) to Scala. This PR will be refactored afterwards. |
user_tools/tests/spark_rapids_tools_e2e/features/event_log_processing.feature
Outdated
Show resolved
Hide resolved
Signed-off-by: Partho Sarthi <[email protected]>
This reverts commit 8921a85 Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa
Just add a comment in the config file to explain why we picked those new threshold for the photon categories..
upperBound: 1000000.0 | ||
- columnName: 'Unsupported Operators Stage Duration Percent' | ||
lowerBound: 0.0 | ||
upperBound: 25.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is okay for now to keep that in order to unblock the photon feature.
Later, we can revisit this.
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa! LGTM.
Issue #251.
This PR introduces support for recommending Photon applications, using a separate strategy for categorizing them:
Additionally, the Small category for Photon applications is different from that of Spark-based applications:
Note
Output
sparkRuntime
inapp_metadata.json
Changes
Enhancements and New Features:
tool_ctxt.py
: Introduced a new methodget_metrics_output_folder
to fetch the metrics output directory.qualification-conf.yaml
: Updated configuration to include new metrics subfolder and execution engine settings. [1] [2] [3] [4]enums.py
: Added a newExecutionEngine
class to represent different execution engines.speedup_category.py
: IntroducedSpeedupStrategy
class and refactored methods to accommodate execution engine-specific speedup strategies. [1] [2] [3] [4]Refactoring and Utility Improvements:
qualification.py
: Added a helper method_read_qualification_metric_file
to read metric files and_assign_execution_engine_to_apps
to assign execution engines to applications.util.py
: Added a utility methodconvert_df_to_dict
to convert DataFrames to dictionaries.Tests:
event_log_processing.feature
: Added new test scenarios to validate the execution engine assignment.e2e_utils.py
andtest_steps.py
: Updated end-to-end test utilities to support new features. [1] [2] [3]Follow Up