Skip to content

Conversation

@jesspav
Copy link
Collaborator

@jesspav jesspav commented Dec 12, 2025

In flamegraphs for PR #430 revealed that the CRS serialization had some opportunities for improvement that would be essential for raster/geo per item CRS functions. This was going to be especially important for CRS comparisons in joins.

Added a simple benchmarks for lnglat and auth code equality that was run repeatedly with different performance optimizations. At the start it took 61ms for auth codes and 37ms for lnglat. By the end auth codes have improved by over 20x and lnglat by over 10x, with both running in 2.7ms.

There are basically two changes: an LRU cache and avoiding unnecessary string allocations. At the end, checked to see if we no longer needed the cache, but that brought the time back to 12ms.

Breakdown of performance changes:

Starting point:


     Running benches/crs_benchmarks.rs (target/release/deps/crs_benchmarks-c62a15af9667b709)
equality_lnglat_crs     time:   [37.442 ms 37.712 ms 37.960 ms]
                        change: [+2.5665% +4.0155% +5.4781%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  6 (6.00%) low severe
  9 (9.00%) low mild

Benchmarking equality_different_crs: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, or reduce sample count to 70.
equality_different_crs  time:   [61.892 ms 62.119 ms 62.362 ms]
                        change: [+2.1679% +2.6956% +3.2470%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe


After cache:
equality_lnglat_crs     time:   [28.265 ms 28.361 ms 28.461 ms]
                        change: [−25.338% −24.795% −24.189%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

equality_different_crs  time:   [27.473 ms 27.530 ms 27.586 ms]
                        change: [−55.878% −55.683% −55.494%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild



Time after cache + switch value to str +  + colon rather than split

equality_lnglat_crs     time:   [11.324 ms 11.354 ms 11.385 ms]
                        change: [−60.148% −59.967% −59.798%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

equality_different_crs  time:   [11.655 ms 11.701 ms 11.746 ms]
                        change: [−57.674% −57.497% −57.308%] (p = 0.00 < 0.05)
                        Performance has improved.



Time after cache + switch value to str + colon rather than split + pre-alloc instead of format for eq:
Benchmarking equality_lnglat_crs: Collecting 100 samples in estimated 5.4946 s (800 iterat
equality_lnglat_crs     time:   [6.6983 ms 6.7258 ms 6.7557 ms]
                        change: [+0.7120% +1.2645% +1.8552%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

Benchmarking equality_different_crs: Collecting 100 samples in estimated 5.4518 s (800 ite
equality_different_crs  time:   [6.7116 ms 6.7429 ms 6.7758 ms]
                        change: [+0.7410% +1.5266% +2.2864%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild


All of the above +  not splitting up the auth code:
equality_lnglat_crs     time:   [3.3993 ms 3.4098 ms 3.4200 ms]
                        change: [−49.633% −49.425% −49.229%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) low mild

equality_different_crs  time:   [3.4080 ms 3.4170 ms 3.4258 ms]
                        change: [−49.183% −48.996% −48.819%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

with thread local cache:
     Running benches/crs_benchmarks.rs (target/release/deps/crs_benchmarks-c62a15af9667b709)
equality_lnglat_crs     time:   [2.6705 ms 2.6748 ms 2.6790 ms]
                        change: [−1.5862% −1.2966% −1.0092%] (p = 0.00 < 0.05)
                        Performance has improved.

equality_different_crs  time:   [2.6570 ms 2.6619 ms 2.6668 ms]
                        change: [−2.7208% −1.8683% −1.2686%] (p = 0.00 < 0.05)
                        Performance has improved.

@jesspav jesspav requested a review from Copilot December 12, 2025 20:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR significantly improves CRS (Coordinate Reference System) serialization performance through caching and string allocation optimizations. The changes achieve a ~20x speedup for authority code comparisons and ~10x speedup for lnglat comparisons, reducing execution times from 61ms to 3.4ms for auth codes and 37ms to 3.4ms for lnglat.

Key Changes:

  • Introduced LRU cache for CRS deserialization results
  • Refactored deserialize_crs to accept &str instead of &Value, eliminating unnecessary JSON parsing
  • Consolidated authority and code fields into single auth_code string in AuthorityCode struct to reduce allocations

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
rust/sedona-schema/src/crs.rs Core CRS implementation with LRU cache, string-based deserialization, and optimized AuthorityCode structure
rust/sedona-schema/src/datatypes.rs Updated to handle both string and object CRS values during deserialization
rust/sedona-schema/benches/crs_benchmarks.rs Added benchmark suite for CRS equality operations
rust/sedona-schema/Cargo.toml Added lru dependency and benchmark configuration
rust/sedona-functions/src/st_srid.rs Updated tests to use string-based CRS deserialization
rust/sedona-functions/src/st_setsrid.rs Simplified CRS deserialization calls
c/sedona-proj/src/st_transform.rs Simplified CRS deserialization calls
c/sedona-proj/src/sd_order_lnglat.rs Removed unused import and simplified CRS deserialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator Author

@jesspav jesspav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment in line for @paleolimbot

@jesspav jesspav marked this pull request as ready for review December 12, 2025 21:51
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for investigating and working on this!

I think it's worth keeping a version of deserialize_crs() around that does operate on a Value specifically for deserializing a GeoArrow or GeoParquet crs (where the JSON has already been parsed and reserializing it just to parse it again is probably slow). Or alternatively, add some very long PROJJSON string benchmarks to make sure!

Comment on lines 28 to 36
/// LRU cache for CRS deserialization
static CRS_CACHE: OnceLock<Mutex<LruCache<String, Crs>>> = OnceLock::new();

fn get_crs_cache() -> &'static Mutex<LruCache<String, Crs>> {
CRS_CACHE.get_or_init(|| {
// Cache up to 256 CRS strings
Mutex::new(LruCache::new(NonZeroUsize::new(256).unwrap()))
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should this be thread local to avoid contention when CRS computations are happening on many partitions at once?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!!! That should have an impact on the perf of a single thread, too, since it avoids the lock.

any opinion on the magic number for size of the cache? this could get very large if all the keys are json!!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it 50 for now. I think this should be configurable given the perf implications. I'll can add an issue so we don't forget.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants