Benchmark Results on Raptor Lake (E-Core and P-Core!) #955

Mooshua · 2024-01-25T21:25:44Z

Mooshua
Jan 25, 2024

I've used CPUsets on an LXC container to run the Primes benchmarks on only the efficiency-cores (Gracemont) and performance-cores (Raptor Cove) on the i5-13400.

Performance: 2x Raptor Cove P-Cores (4 threads) @ 4.6GHz (32KB L1)
Efficiency: 1x Gracemont E-Cluster (4 threads) @ 3.3GHz (32KB L1)

The following machine was used to run the benchmarks

Host OS: Proxmox VE CE
Guest OS: Alma Linux 9 (LXC edition)
Memory: 2x8GB DDR4-3200MT/s
Processor: i5-13400
Chipset: H670

Results

LuaJIT

I ran my LuaJIT prime workload on all 3 machines, as the same underlying benchmark program is configured to run under a wide variety of workloads and configurations. I've included a few in the table below.

Interestingly, the Gracemont core doesn't handle the interpreter's hashtables very well (vm_hash), despite handling the slower (and more cumbersome) interpreted FFI handlers just fine (vm_ffi). Both cores had slightly higher scores with an unroll factor of 8 compared to an unroll factor of 16.

The baseline JIT workloads (jit_slow) are handled pretty well by both the Gracemont and Raptor cores. However, the plain fast-JIT workload (jit) has the 4th-largest performance gap in the suite.

The cache-optimized fast-JIT workloads are where we see each core start to shine (jit_16_c...k). The Gracemont core performs best when running blocks of 32kb (jit_16_c16k), and gets to 71% of the Raptor core's performance at a smaller block size of 16kb (jit_16_c8k). However, it drops off at 48kb (jit_16_c24k) and performance begins to plummet as more and more L1 evictions occur.

The raptor cores stay strong throughout all cache-optimized runs, continuing to net wins as the execution blocks grow. This may indicate better L1 eviction performance or an optimization in linear-access prefetching. (Raptor cove is known to have a significant improvement in prefetch heuristics compared to previous generations)

Workload	E-Core	P-Core	Ratio	Notes
jit_16_c64k	11224	34198	32.8%	Peak P-Core
jit_16_c48k	12804	33908	37.8%
jit_16_c32k	14059	31380	44.8%
jit_16_c24k	16608	29649	56.0%	E-Core cache performance suffers at workloads above 32kb
jit_16_c16k	16728	25791	64.9%	Peak E-Core
jit_16_c8k	12528	17620	71.1%	Less performance disparity on pure-cache workload
jit	6272	15293	41.0%	Large performance disparity when executing plain sieve workload
jit_slow_ffi	1607	3169	50.7%
jit_slow_hash	962	1656	58.1%	E-Core catches up to P-Core hashtable performance in the unoptimized JIT code
vm_hash	223	868	25.6%	P-Core has significantly faster hashtable access in interpreter
vm_ffi	193	330	58.4%

Top 8

Below are tables for just the top 8 single-threaded benchmarks for the Raptor cores and Gracemont cores. We can clearly see from the results that the Raptor cores handle the prime workload significantly better, at about 2x the operations/second compared to the Gracemont cores.

Top 8 - Gracemont (Efficiency)

Index	Implementation	Solution	Label	Passes	Duration	Passes/Second
5	c	5	rogiervandam_extend-u64v4b31	55661	5.00005	11132.09758
6	c	5	rogiervandam_extend-u32v8b31	55487	5.00006	11097.26017
7	c	5	rogiervandam_extend-u64v8b31	42208	5.00008	8441.46156
8	go	2	ssovest-go-other-u32-seg-16k	29323	5.00087	5863.57750
9	c	2	danielspaangberg_5760of30030_owrb	27936	5.00016	5587.01675
10	nim	3	GordonBGood_extreme-hybrid	25584	5.00016	5116.63322
11	crystal	2	GordonBGood_extreme-hybrid	24359	5.00005	4871.75421
12	c	2	danielspaangberg_480of2310_owrb	23711	5.00019	4742.01601

Top 8 - Raptor (Performance)

Index	Implementation	Solution	Label	Passes	Duration	Passes/Second
5	c	5	rogiervandam_extend-u32v8b32	104809	5.00004	20961.63650
6	c	5	rogiervandam_extend-u64v4b32	104264	5.00005	20852.61233
7	zig	3	77-ManDeJan&ityonemo&SpexGuy-zig-single-inverted-bitSieve-unrolled-run-u64v8h-deLUT-spLUT-find-u8-advanced-5760of30030v	74316	5.00003	14863.11082
8	zig	3	51-ManDeJan&ityonemo&SpexGuy-zig-single-bitSieve-unrolled-run-u64v8h-deLUT-spLUT-find-u32	70091	5.00004	14018.08786
9	nim	3	GordonBGood_extreme-hybrid	69371	5.00006	13874.02563
10	haskell	2	GordonBGood_extreme-hybrid	63750	5.00097	12747.53410
11	crystal	2	GordonBGood_extreme-hybrid	56560	5.00022	11311.50682
12	c	2	danielspaangberg_5760of30030_owrb	55000	5.00003	10999.92740

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Results on Raptor Lake (E-Core and P-Core!) #955

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Benchmark Results on Raptor Lake (E-Core and P-Core!) #955

Mooshua Jan 25, 2024

Results

LuaJIT

Top 8

Top 8 - Gracemont (Efficiency)

Top 8 - Raptor (Performance)

Replies: 0 comments

Mooshua
Jan 25, 2024