Skip to content

Add sse4.1 fast path for u64#66

Open
dtolnay wants to merge 1 commit intomasterfrom
fast
Open

Add sse4.1 fast path for u64#66
dtolnay wants to merge 1 commit intomasterfrom
fast

Conversation

@dtolnay
Copy link
Owner

@dtolnay dtolnay commented Dec 27, 2025

performance

@dtolnay dtolnay mentioned this pull request Dec 28, 2025
@xtqqczze
Copy link
Contributor

xtqqczze commented Feb 2, 2026

The fast path is overall improvement for aarch64 as well, unsurprisingly

Benchmark results (aarch64)
Benchmarking u64[0]/itoa: Collecting 100 samples in estimated 5.0000 s (4.1B iteu64[0]/itoa             time:   [1.2231 ns 1.2238 ns 1.2246 ns]
                        change: [−28.457% −28.330% −28.184%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
Benchmarking u64[0]/std::fmt: Collecting 100 samples in estimated 5.0000 s (501Mu64[0]/std::fmt         time:   [9.9792 ns 9.9796 ns 9.9800 ns]
                        change: [−0.0531% +0.0025% +0.0578%] (p = 0.94 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  7 (7.00%) high mild
  7 (7.00%) high severe

Benchmarking u64[half]/itoa: Collecting 100 samples in estimated 5.0000 s (1.9B u64[half]/itoa          time:   [2.7003 ns 2.7004 ns 2.7005 ns]
                        change: [−5.7593% −5.7075% −5.6534%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe
Benchmarking u64[half]/std::fmt: Collecting 100 samples in estimated 5.0001 s (3u64[half]/std::fmt      time:   [14.088 ns 14.089 ns 14.090 ns]
                        change: [−0.0510% +0.0054% +0.0652%] (p = 0.87 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe

Benchmarking u64[max]/itoa: Collecting 100 samples in estimated 5.0000 s (1.4B iu64[max]/itoa           time:   [3.5462 ns 3.5472 ns 3.5490 ns]
                        change: [−24.522% −24.472% −24.422%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
Benchmarking u64[max]/std::fmt: Collecting 100 samples in estimated 5.0001 s (26u64[max]/std::fmt       time:   [18.990 ns 19.074 ns 19.176 ns]
                        change: [+1.8795% +2.2892% +2.7503%] (p = 0.00 < 0.05)
                        Performance has regressed.
</details>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants