Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best Practices for Storing and Querying Full History and Latest Versions #11618

Open
Selinix opened this issue Nov 21, 2024 · 0 comments
Open
Labels
question Further information is requested

Comments

@Selinix
Copy link

Selinix commented Nov 21, 2024

Query engine

Spark for loading, Trino for querying

Question

Hi,

I’m looking for guidance on the most efficient solution for maintaining full history and querying the latest versions of events without maintaining redundant copies of the data.

A use case is to be able to query either:

  1. All versions of an event (e.g., SELECT * FROM full_hist WHERE id = 'XXX')
  2. Only the latest version of an event (e.g., SELECT * FROM latest_slice WHERE id = 'XXX')

The latest version is determined by the maximum value in a version field for each id.

Questions

  1. Is it better to maintain:
    • A single table with full history and periodically deduplicate it into a separate latest_slice table?
    • Or a single full history table with a view that computes the latest versions dynamically?
  2. If the latter, does applying optimization techniques like partitioning, sorting, and ordering on the full history table significantly improve performance for querying the latest versions?
  3. Given the preference to store only one copy of the data, what is the most performant and practical solution for this scenario?

Thank you for your guidance!

@Selinix Selinix added the question Further information is requested label Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant