Skip to content

Conversation

@jogrogan
Copy link
Collaborator

@jogrogan jogrogan commented Jan 12, 2026

This change addresses a few major performance issues

  1. The jdbc connection internals will stand up a connection to validate it can connect prior to any actual connection being instantiated. This means that the actual connect interface should be lightweight and any processing should be done lazily. This change addresses that by removing the populate methods found in each driver and instead overloading tables() in each schema to load only when tables are needed for a given driver.
  2. Lookup (used by tables()) has two relevant methods get(String name) to retrieve a single table by name, and getNames(LikePattern pattern) to retrieve all table names that match a regex. In practice, Calcite calls these functions many times:
    • Implemented LazyTableLookup to cache repeated calls
    • Calcite will often call getNames with the regex matching a single name to validate that a table exists. Added a micro-optimization in this case to only actually load all tables when a real regex pattern is used.
      • In theory these could be less optimal if getNames is called often after repeated get calls but I don't see this access pattern in practice.
  3. The default Driver implementation within Calcite creates a new root schema at connection time. There is functionality within Calcite to provide a boolean to enable caching on this root schema or not. With caching enabled a CachingCalciteSchema will be used, if not SimpleCalciteSchema will be used. CachingCalciteSchema attempts to cache table and function retrieval, however, in practice this clashes with the above caching we have at the driver schema level. Additionally, the way CachingCalciteSchema caches is by using a SnapshotLookup which will eagerly load all tables and store a snapshot of the whole schema + all tables at instantiation time. It also bizarrely only caches get(String name) calls, not getNames(LikePattern pattern) calls, which leads to a lot of redundancy. This change implements CalciteDriver on top of Driver which simply exposes the ability to enable caching or not. Caching at the driver level is by default disabled now, relying instead on caching at the table level (implemented by LazyTableLookup) instead.

Added testing for LazyTableLookup, validated many existing code paths including !tables, !describe, creating views, cli, etc.

Did some performance analysis internally, planning a pipeline in a test environment (low number of databases & tables) was taking upwards of ~35s for a simple SELECT query. With these changes the same query is taking ~5s. I expect a much bigger difference when more drivers or drivers with many more tables are loaded.

@jogrogan jogrogan force-pushed the jogrogan/performance branch from 487fc0a to 801d2b0 Compare January 12, 2026 19:16
Copy link
Collaborator

@ryannedolan ryannedolan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

@jogrogan jogrogan merged commit f929fd6 into main Jan 13, 2026
1 check passed
@jogrogan jogrogan deleted the jogrogan/performance branch January 13, 2026 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants