Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Qualification tool: Add more information in output for Execs if there are unsupported Expressions #626

Closed
nartal1 opened this issue Oct 23, 2023 · 1 comment · Fixed by #680
Assignees
Labels
core_tools Scope the core module (scala) feature request New feature or request

Comments

@nartal1
Copy link
Collaborator

nartal1 commented Oct 23, 2023

Is your feature request related to a problem? Please describe.
The qualification csv output prints Execs as unsupported Execs if there are any expressions not supported within that Exec. In some instances, Execs are output in a row without mentioning any reason. We need to improve the output by providing more information in the rows that have Execs.

Example:

 "Exec" "Filter" ""
  "Exec" "Filter" "Filter Exec is not supported as expressions are not supported -  `AtLeastNNulls`"
  "Exec" "LocalTableScan" ""
  "Exec" "Project" ""
   "Exec" "Project" "Project Exec is not supported as expressions are not supported -  `to_date`"

In above Example, Filter and Project are not supported due to expressions not supported within them. But we also see rows that don't have expressions but Filter and Project are mentioned as not supported which can be confusing. We need to deduplicate the rows which do not have reasons for Execs and combine the output.

@nartal1 nartal1 added feature request New feature or request ? - Needs Triage core_tools Scope the core module (scala) labels Oct 23, 2023
@mattahrens
Copy link
Collaborator

Idea: could we add a new column in the output that represents the duration % for the stage that the exec/expression was in? It wouldn't represent the exact duration for the individual unsupported operation, but knowing how much the stage with the unsupported operation took up compared to the overall job would be useful for prioritization.

Logic = stage duration (wall clock) / app duration (wall clock)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants