Skip to content

Conversation

amahussein
Copy link
Collaborator

Signed-off-by: Ahmed Hussein (amahussein) [email protected]

Fixes #1887

Simplifies the interface in order to use the Tools-API more efficiently.

This pull request refactors the API v1 report handler system in the RAPIDS PyTools codebase to simplify and modernize the way core report handlers are created and used. The main change is the removal of the legacy APIHelpers builder pattern in favor of direct class-based instantiation for report handlers (e.g., QualCore, ProfCore). This results in cleaner, more readable code and a more consistent interface for handling reports. Additionally, the CombinedDFBuilder is replaced with a new CombinedCSVBuilder class and metaclass for combining CSV reports.

Key changes include:

API Handler Refactor and Simplification:

  • Removed the APIHelpers builder class and associated indirection. Now, core report handlers like QualCore and ProfCore are instantiated directly, leading to more straightforward and maintainable code. [1] [2]
  • Updated imports and usages throughout the codebase to use the new handler classes directly instead of via APIHelpers. [1] [2] [3] [4] [5] [6] [7]

Core Handler Property Changes:

  • Changed the core_handler property in RapidsJarTool and related classes to return the new APIResHandler-based handler types, and updated instantiation logic accordingly. [1] [2] [3] [4]

Report and Combiner Usage Updates:

  • Replaced all usages of APIHelpers.CombinedDFBuilder with the new CombinedCSVBuilder class and updated the logic to use the new combiner interface. [1] [2] [3] [4] [5]
  • Updated report reading logic to use new handler methods like self.core_handler.csv(...) and self.core_handler.txt(...) instead of the old report classes. [1] [2] [3]

Cleanup and Documentation:

  • Removed the now-unused APIResultHandler and related builder logic, and updated documentation and examples to reflect the new usage patterns. [1] [2]

These changes make the codebase easier to maintain, reduce boilerplate, and provide a more Pythonic interface for working with core report handlers and CSV combiners.

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>

Fixes NVIDIA#1887

Simplifies the interface in order to use the Tools-API more efficiently.
@amahussein amahussein self-assigned this Aug 29, 2025
@Copilot Copilot AI review requested due to automatic review settings August 29, 2025 18:03
@amahussein amahussein added the api_change A change affecting the output (add/remove/rename files, add/remove/rename columns) label Aug 29, 2025
@github-actions github-actions bot added the user_tools Scope the wrapper module running CSP, QualX, and reports (python) label Aug 29, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request refactors the API v1 report handler system to simplify and modernize the interface for using the Tools-API. The main change removes the legacy APIHelpers builder pattern in favor of direct class-based instantiation for core report handlers like QualCore and ProfCore, resulting in cleaner and more maintainable code.

Key changes:

  • Removed APIHelpers builder class and replaced with direct instantiation of core handlers
  • Introduced new CombinedCSVBuilder class with metaclass to replace CombinedDFBuilder
  • Updated all imports and usages throughout the codebase to use the new handler classes directly

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test_result_handler_steps.py Updated test imports and handler instantiation to use new API
qualx/util.py Replaced APIHelpers with direct QualCore/QualWrapper usage
qualx/qualx_main.py Updated imports and CombinedDFBuilder usage to new CombinedCSVBuilder
qualx/preprocess.py Changed ProfWrapper import from APIHelpers to direct import
qualification_stats_report.py Simplified CSV builder usage with new handler methods
result_handler.py Added get_raw_metrics_path method to base handler
qual_handler.py Removed duplicate get_raw_metrics_path method
builder.py Major refactor: removed APIHelpers, added new builder classes and metaclasses
init.py Updated exports to include new handler classes
rapids_tool.py Updated core_handler property to return APIResHandler type
prediction.py Updated qual_handler property to use direct QualCore instantiation
qualification_stats.py Updated QualCore import and instantiation
qualification_core.py Updated type annotations for new QualCore class
qualification.py Replaced APIHelpers usage with direct handler methods
profiling_core.py Updated type annotations for new ProfCore class
profiling.py Updated TXT report usage to use new handler methods

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Copy link
Collaborator

@leewyang leewyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One spelling nitpick, otherwise LGTM. API looks much cleaner!

) as c_builder:
with self.core_handler.csv_combiner(
'clusterInfoJSONReport'
).supress_failure() as c_builder:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "suppress_failure"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thanks @leewyang

Copy link
Collaborator

@sayedbilalbari sayedbilalbari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amahussein for this neat implementation !
Just some doubts regarding an edge case and the usage of Meta functions.
Makes sense from a code segregation point of view. Highlighting it through comments.

"""
A generic internal class for building API result handlers.
Metaclass for CombinedCSVBuilder to register subclasses.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Metaclass here extracts the ToolResultHandler in case a generic APIResultHandler is passed.
I don't see it registering any subclasses. The comment could be updated

*args,
**kwargs
) -> 'CombinedCSVBuilder':
if isinstance(handlers, list):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic assumes that the input is either a list of ToolsResultHandlerT or a list of ApiHandler[ToolsResultHandlerT] hence trying to extract the handler when it sees a list.
The CombinedCsvBuilder natively accepts ToolsResultHandlerT or a list[ToolsResultHandlerT] in which case I am assuming a dev can resort to passing list[QualCoreHandler, ProfCoreHandler.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the docstrings with more details about the usage of the case of the list 'APIResHandler[ToolResultHandlerT]'].

I am assuming a dev can resort to passing list[QualCoreHandler, ProfCoreHandler.

This is correct if he is calling the init of the object itself (not the metaClass__call__)

handlers_arg = [r.handler for r in handlers]
else:
handlers_arg = handlers
instance: 'CombinedCSVBuilder' = super(CombinedCSVBuilderMeta, cls).__call__(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions in my head, since the Meta function adds handler extraction, what's the benefit of adding a layer of complexity by the use of Meta class vs keeping this functions simple and doing this in either the init or call of the CombinedCsvBuilder itself.
I get the part of how this keeps preprocessing separate from object initialization. Just feel a bit of an overkill for a preprocessing logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason in doing this is to provide an initializer wrapper that hides the other fields in each subclass.
Then, the metaClass can use the class registry in checking the instances and handlem them on the fly.
I believe that part of the confusion here is because I haven't put the "class registry" yet. This is okay because the class registry should not affect how the user is using the API.


Parameters:
table (str): The table label to load from each handler.
handlers (ToolResultHandlerT or List[ToolResultHandlerT]): One or more result handlers to aggregate.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a comment here in reference to the comment about passing a list of ToolsResultHandlerT

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I regenerated the docstings

detail_reason = 'a None DataFrame' if final_res.data is None else 'an empty DataFrame'
raise ValueError(
f'Loading report {self.table} on dataset returned {detail_reason}.')
# If we reach this point, then there is no raise on exceptions, just proceed with
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed comments explaining the flow of build

"""
REPORT_LABEL: ClassVar[str] = ''
_out_path: Optional[Union[str, BoundedCspPath]] = None
_report_id: Optional[str] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confused by the usage of REPORT_LABEL and _report_id here.
The LABEL is used to decide to build the Handler or not which internally does the same thing of storing the LABEL value as the _report_id.
I see the usage of ClassVar here assuming this helps in case of using multiple handlers ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_report_id is a field captured in the object itself. While REPORT_LABEL is the classVar.
This redundancy is intended to allow separate representation between internal representation and what the users are facing.

# call the metaclass constructor to create the instance
instance = type(cls).__call__(cls, out_path=out_path)
# set the report ID and build the handler
instance = instance.report(report_id).build()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to what we are doing in the Meta Functions call. Hence the confusion regarding the usage of REPORT_LABEL and report_id

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is a special case to unlock the unit-tests.

  • users are typically going to use the metaClass to instantiate their objects. This more convenient so they don't have to deal with the labels.
  • In unit tests, to make it easier for scenarios, the report_id is based as an argument. Therefore, I created this backdoor to pick the right object. Another approach was to loop on all the classes to figure out which one has the valid label. However, it will cause ambiguity because the string argument passed to the constructor in both cases is a string (out_path Vs, report_label)

Copy link
Collaborator Author

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sayedbilalbari for the feedback.

# call the metaclass constructor to create the instance
instance = type(cls).__call__(cls, out_path=out_path)
# set the report ID and build the handler
instance = instance.report(report_id).build()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is a special case to unlock the unit-tests.

  • users are typically going to use the metaClass to instantiate their objects. This more convenient so they don't have to deal with the labels.
  • In unit tests, to make it easier for scenarios, the report_id is based as an argument. Therefore, I created this backdoor to pick the right object. Another approach was to loop on all the classes to figure out which one has the valid label. However, it will cause ambiguity because the string argument passed to the constructor in both cases is a string (out_path Vs, report_label)

"""
REPORT_LABEL: ClassVar[str] = ''
_out_path: Optional[Union[str, BoundedCspPath]] = None
_report_id: Optional[str] = None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_report_id is a field captured in the object itself. While REPORT_LABEL is the classVar.
This redundancy is intended to allow separate representation between internal representation and what the users are facing.


Parameters:
table (str): The table label to load from each handler.
handlers (ToolResultHandlerT or List[ToolResultHandlerT]): One or more result handlers to aggregate.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I regenerated the docstings

handlers_arg = [r.handler for r in handlers]
else:
handlers_arg = handlers
instance: 'CombinedCSVBuilder' = super(CombinedCSVBuilderMeta, cls).__call__(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason in doing this is to provide an initializer wrapper that hides the other fields in each subclass.
Then, the metaClass can use the class registry in checking the instances and handlem them on the fly.
I believe that part of the confusion here is because I haven't put the "class registry" yet. This is okay because the class registry should not affect how the user is using the API.

*args,
**kwargs
) -> 'CombinedCSVBuilder':
if isinstance(handlers, list):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the docstrings with more details about the usage of the case of the list 'APIResHandler[ToolResultHandlerT]'].

I am assuming a dev can resort to passing list[QualCoreHandler, ProfCoreHandler.

This is correct if he is calling the init of the object itself (not the metaClass__call__)

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
@amahussein amahussein merged commit 1ec7e7a into NVIDIA:dev Sep 3, 2025
14 checks passed
@amahussein amahussein deleted the rapids-tools-1887 branch September 3, 2025 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api_change A change affecting the output (add/remove/rename files, add/remove/rename columns) user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Finalize the ToolsAPI helper methods and classes
3 participants