Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use faster solr faceting for dashboard stats #6865

Merged
merged 1 commit into from
Aug 7, 2024

Conversation

CGillen
Copy link
Contributor

@CGillen CGillen commented Jul 25, 2024

Summary

The main dashboard statistics loads slowly the more works, work types, and resource type values in solr. This can lead to timeouts in severe cases.

Guidance for testing, such as acceptance criteria or new user interface behaviors:

  • With a large number of works (25k+) and different resource type values (50+) visit the dashboard /dashboard
  • Record the time to page load before and after fix
  • Check for no discrepancies between displayed statistics

Type of change (for release notes)

  • notes-bugfix Bug Fixes

Detailed Description

The stats graphic on the dashboard was built using large solr responses and several calls to looping methods (#each, #group_by?, #transform_values). Obviously, loops are very slow over large arrays/hashes. Solr can do this sort of data massaging for us with its facet API in constant time.

This change uses Solr Facets to calculate the number of works grouped by any solr field, in particular human_readable_type_sim for work types and resource_type_sim for resource types. Note this uses *_sim rather than *_tesim so the keys aren't mangled (string vs text). This can be done on a 0-row Solr query (opposed to the arbitrary 100k, which breaks on repos w/ more than 100k works).

The most interesting line: Hash[*response['facet_counts']['facet_fields'][query]] is a technique to turn the Solr facet response, an array in the form ['key', 'value, 'key', value...] into the appropriate hash {key: value, key: value...}. This is also resilient to nil or some empty values if the facet response isn't right.

Changes proposed in this pull request:

  • Replace large solr query and repetitive looping with faster Solr Facet data

@samvera/hyrax-code-reviewers

@orangewolf orangewolf added the notes-minor Release Notes: Non-breaking features label Aug 5, 2024
@orangewolf
Copy link
Member

@CGillen I'm marking that as notes-minor since it changes some underpinnings and improves performance. Whether performance issues are bugs or features is a pretty subjective thing so feel free to change it back in the label if you disagree

Copy link
Member

@orangewolf orangewolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great performance pick up. thank you for this code

@orangewolf orangewolf merged commit 519dc97 into samvera:main Aug 7, 2024
21 checks passed
@CGillen CGillen deleted the dashboard-stats-performance branch August 8, 2024 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notes-minor Release Notes: Non-breaking features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants