Improve perf of CRS #446

jesspav · 2025-12-12T20:15:30Z

In flamegraphs for PR #430 revealed that the CRS serialization had some opportunities for improvement that would be essential for raster/geo per item CRS functions. This was going to be especially important for CRS comparisons in joins.

Added a simple benchmarks for lnglat and auth code equality that was run repeatedly with different performance optimizations. At the start it took 61ms for auth codes and 37ms for lnglat. By the end auth codes have improved by over 20x and lnglat by over 10x, with both running in 2.7ms.

There are basically two changes: an LRU cache and avoiding unnecessary string allocations. At the end, checked to see if we no longer needed the cache, but that brought the time back to 12ms.

Breakdown of performance changes:

Starting point:


     Running benches/crs_benchmarks.rs (target/release/deps/crs_benchmarks-c62a15af9667b709)
equality_lnglat_crs     time:   [37.442 ms 37.712 ms 37.960 ms]
                        change: [+2.5665% +4.0155% +5.4781%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  6 (6.00%) low severe
  9 (9.00%) low mild

Benchmarking equality_different_crs: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, or reduce sample count to 70.
equality_different_crs  time:   [61.892 ms 62.119 ms 62.362 ms]
                        change: [+2.1679% +2.6956% +3.2470%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe


After cache:
equality_lnglat_crs     time:   [28.265 ms 28.361 ms 28.461 ms]
                        change: [−25.338% −24.795% −24.189%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

equality_different_crs  time:   [27.473 ms 27.530 ms 27.586 ms]
                        change: [−55.878% −55.683% −55.494%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild



Time after cache + switch value to str +  + colon rather than split

equality_lnglat_crs     time:   [11.324 ms 11.354 ms 11.385 ms]
                        change: [−60.148% −59.967% −59.798%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

equality_different_crs  time:   [11.655 ms 11.701 ms 11.746 ms]
                        change: [−57.674% −57.497% −57.308%] (p = 0.00 < 0.05)
                        Performance has improved.



Time after cache + switch value to str + colon rather than split + pre-alloc instead of format for eq:
Benchmarking equality_lnglat_crs: Collecting 100 samples in estimated 5.4946 s (800 iterat
equality_lnglat_crs     time:   [6.6983 ms 6.7258 ms 6.7557 ms]
                        change: [+0.7120% +1.2645% +1.8552%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

Benchmarking equality_different_crs: Collecting 100 samples in estimated 5.4518 s (800 ite
equality_different_crs  time:   [6.7116 ms 6.7429 ms 6.7758 ms]
                        change: [+0.7410% +1.5266% +2.2864%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild


All of the above +  not splitting up the auth code:
equality_lnglat_crs     time:   [3.3993 ms 3.4098 ms 3.4200 ms]
                        change: [−49.633% −49.425% −49.229%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) low mild

equality_different_crs  time:   [3.4080 ms 3.4170 ms 3.4258 ms]
                        change: [−49.183% −48.996% −48.819%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

with thread local cache:
     Running benches/crs_benchmarks.rs (target/release/deps/crs_benchmarks-c62a15af9667b709)
equality_lnglat_crs     time:   [2.6705 ms 2.6748 ms 2.6790 ms]
                        change: [−1.5862% −1.2966% −1.0092%] (p = 0.00 < 0.05)
                        Performance has improved.

equality_different_crs  time:   [2.6570 ms 2.6619 ms 2.6668 ms]
                        change: [−2.7208% −1.8683% −1.2686%] (p = 0.00 < 0.05)
                        Performance has improved.

Copilot

Pull request overview

This PR significantly improves CRS (Coordinate Reference System) serialization performance through caching and string allocation optimizations. The changes achieve a ~20x speedup for authority code comparisons and ~10x speedup for lnglat comparisons, reducing execution times from 61ms to 3.4ms for auth codes and 37ms to 3.4ms for lnglat.

Key Changes:

Introduced LRU cache for CRS deserialization results
Refactored deserialize_crs to accept &str instead of &Value, eliminating unnecessary JSON parsing
Consolidated authority and code fields into single auth_code string in AuthorityCode struct to reduce allocations

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
rust/sedona-schema/src/crs.rs	Core CRS implementation with LRU cache, string-based deserialization, and optimized AuthorityCode structure
rust/sedona-schema/src/datatypes.rs	Updated to handle both string and object CRS values during deserialization
rust/sedona-schema/benches/crs_benchmarks.rs	Added benchmark suite for CRS equality operations
rust/sedona-schema/Cargo.toml	Added lru dependency and benchmark configuration
rust/sedona-functions/src/st_srid.rs	Updated tests to use string-based CRS deserialization
rust/sedona-functions/src/st_setsrid.rs	Simplified CRS deserialization calls
c/sedona-proj/src/st_transform.rs	Simplified CRS deserialization calls
c/sedona-proj/src/sd_order_lnglat.rs	Removed unused import and simplified CRS deserialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

rust/sedona-schema/src/crs.rs

rust/sedona-schema/src/datatypes.rs

Co-authored-by: Copilot <[email protected]>

c/sedona-proj/src/sd_order_lnglat.rs

jesspav

Comment in line for @paleolimbot

paleolimbot

Thank you for investigating and working on this!

I think it's worth keeping a version of deserialize_crs() around that does operate on a Value specifically for deserializing a GeoArrow or GeoParquet crs (where the JSON has already been parsed and reserializing it just to parse it again is probably slow). Or alternatively, add some very long PROJJSON string benchmarks to make sure!

paleolimbot · 2025-12-12T22:44:10Z

rust/sedona-schema/src/crs.rs

+/// LRU cache for CRS deserialization
+static CRS_CACHE: OnceLock<Mutex<LruCache<String, Crs>>> = OnceLock::new();
+
+fn get_crs_cache() -> &'static Mutex<LruCache<String, Crs>> {
+    CRS_CACHE.get_or_init(|| {
+        // Cache up to 256 CRS strings
+        Mutex::new(LruCache::new(NonZeroUsize::new(256).unwrap()))
+    })
+}


Can/should this be thread local to avoid contention when CRS computations are happening on many partitions at once?

Yes!!! That should have an impact on the perf of a single thread, too, since it avoids the lock.

any opinion on the magic number for size of the cache? this could get very large if all the keys are json!!

I made it 50 for now. I think this should be configurable given the perf implications. I'll can add an issue so we don't forget.

rust/sedona-schema/src/datatypes.rs

jesspav added 5 commits December 12, 2025 08:46

add bench

ac56e00

with lru cache

bb93526

lots of string optimizations

3bf2ba1

better re-use

3b46f3a

switch from value to str in other files

2e9ae84

jesspav requested a review from Copilot December 12, 2025 20:22

Copilot AI reviewed Dec 12, 2025

View reviewed changes

rust/sedona-schema/src/crs.rs Outdated Show resolved Hide resolved

rust/sedona-schema/src/datatypes.rs Show resolved Hide resolved

Apply suggestions from code review

869ca09

Co-authored-by: Copilot <[email protected]>

jesspav commented Dec 12, 2025

View reviewed changes

c/sedona-proj/src/sd_order_lnglat.rs Show resolved Hide resolved

jesspav commented Dec 12, 2025

View reviewed changes

jesspav marked this pull request as ready for review December 12, 2025 21:51

paleolimbot reviewed Dec 12, 2025

View reviewed changes

jesspav added 2 commits December 14, 2025 07:21

switch to thread local cache and bring back value deserialization

e15cda6

adding comment about cache size

e51dabe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve perf of CRS #446

Improve perf of CRS #446

Uh oh!

jesspav commented Dec 12, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jesspav left a comment

Uh oh!

paleolimbot left a comment

Uh oh!

paleolimbot Dec 12, 2025

Uh oh!

jesspav Dec 14, 2025

Uh oh!

jesspav Dec 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve perf of CRS #446

Are you sure you want to change the base?

Improve perf of CRS #446

Uh oh!

Conversation

jesspav commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jesspav left a comment

Choose a reason for hiding this comment

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

paleolimbot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

jesspav Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

jesspav Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jesspav commented Dec 12, 2025 •

edited

Loading