Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement accumulated improvements to Data Explorer backend protocol, add batch profile requests. In Python implement between, search filter types, prototype search_schema RPC, improve filter API #2585

Merged
merged 12 commits into from
Apr 2, 2024

Conversation

wesm
Copy link
Contributor

@wesm wesm commented Mar 30, 2024

This PR has a batch of improvements to try to catch up with the UI's current state so we can start hooking up filtering for real:

  • Modifies filter API and get_column_profiles APIs to group parameters in optional interfaces, e.g. BetweenFilterParams, SetMembershipFilterParams
  • Adds search_schema RPC to support column dropdown searching large schemas
  • Changes column profile requests to be batch based, so we can request a batch of column null counts all at once
  • Renames "set_column_filters" RPC to "set_row_filters" to normalize terminology since we will eventually have "column filters" (e.g. column names starting with "foo") that hide a subset of the columns from view
  • Adds summary_stats profile type to provide detailed column summary statistics when needed in the UI
  • Adds and implements "between" and "not_between" filter types in Python
  • Basic implementation of case-sensitive and case-insensitive text searching (contains/startswith/endswith/regex_match) in Python
  • Hooks up basic null percentages into the current summary pane. This code can surely be improved

The focus of the next week will be hooking all this up with the recently-built filtering UI and making sure it works properly.

@wesm wesm requested review from softwarenerd and jmcphers March 30, 2024 23:35
@wesm
Copy link
Contributor Author

wesm commented Mar 30, 2024

@jmcphers next week could you have a close look through my changes in data_explorer-backend-openrpc.json? I think the approach I took for different filter types or different profile results is okay for now -- I'm not sure it is so urgent to implement the oneOf union, since this at least makes it clear for each group of properties, which ones must be provided together.

@wesm wesm force-pushed the feature/column-summary-support branch from fc7cac2 to c3e5404 Compare April 1, 2024 16:05
@wesm
Copy link
Contributor Author

wesm commented Apr 1, 2024

@isabelizimm or @seeM can you have a look at https://github.com/posit-dev/positron/actions/runs/8510379975/job/23307867670?pr=2585? Doesn't seem related to these changes, a build with 1-year old dependencies (that doesn't include IPython, does it?)

EDIT: triaged in main

@wesm wesm force-pushed the feature/column-summary-support branch from c3e5404 to 4eb1eb5 Compare April 1, 2024 23:40
Copy link
Collaborator

@jmcphers jmcphers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally reasonable. So nice to see something other than 29% for the null count! 🎉

@wesm wesm merged commit 50b92a9 into main Apr 2, 2024
23 checks passed
@wesm wesm deleted the feature/column-summary-support branch April 2, 2024 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants