You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've used CPUsets on an LXC container to run the Primes benchmarks on only the efficiency-cores (Gracemont) and performance-cores (Raptor Cove) on the i5-13400.
I ran my LuaJIT prime workload on all 3 machines, as the same underlying benchmark program is configured to run under a wide variety of workloads and configurations. I've included a few in the table below.
Interestingly, the Gracemont core doesn't handle the interpreter's hashtables very well (vm_hash), despite handling the slower (and more cumbersome) interpreted FFI handlers just fine (vm_ffi). Both cores had slightly higher scores with an unroll factor of 8 compared to an unroll factor of 16.
The baseline JIT workloads (jit_slow) are handled pretty well by both the Gracemont and Raptor cores. However, the plain fast-JIT workload (jit) has the 4th-largest performance gap in the suite.
The cache-optimized fast-JIT workloads are where we see each core start to shine (jit_16_c...k). The Gracemont core performs best when running blocks of 32kb (jit_16_c16k), and gets to 71% of the Raptor core's performance at a smaller block size of 16kb (jit_16_c8k). However, it drops off at 48kb (jit_16_c24k) and performance begins to plummet as more and more L1 evictions occur.
The raptor cores stay strong throughout all cache-optimized runs, continuing to net wins as the execution blocks grow. This may indicate better L1 eviction performance or an optimization in linear-access prefetching. (Raptor cove is known to have a significant improvement in prefetch heuristics compared to previous generations)
Workload
E-Core
P-Core
Ratio
Notes
jit_16_c64k
11224
34198
32.8%
Peak P-Core
jit_16_c48k
12804
33908
37.8%
jit_16_c32k
14059
31380
44.8%
jit_16_c24k
16608
29649
56.0%
E-Core cache performance suffers at workloads above 32kb
jit_16_c16k
16728
25791
64.9%
Peak E-Core
jit_16_c8k
12528
17620
71.1%
Less performance disparity on pure-cache workload
jit
6272
15293
41.0%
Large performance disparity when executing plain sieve workload
jit_slow_ffi
1607
3169
50.7%
jit_slow_hash
962
1656
58.1%
E-Core catches up to P-Core hashtable performance in the unoptimized JIT code
vm_hash
223
868
25.6%
P-Core has significantly faster hashtable access in interpreter
vm_ffi
193
330
58.4%
Top 8
Below are tables for just the top 8 single-threaded benchmarks for the Raptor cores and Gracemont cores. We can clearly see from the results that the Raptor cores handle the prime workload significantly better, at about 2x the operations/second compared to the Gracemont cores.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I've used CPUsets on an LXC container to run the Primes benchmarks on only the efficiency-cores (Gracemont) and performance-cores (Raptor Cove) on the i5-13400.
The following machine was used to run the benchmarks
Results
LuaJIT
I ran my LuaJIT prime workload on all 3 machines, as the same underlying benchmark program is configured to run under a wide variety of workloads and configurations. I've included a few in the table below.
Interestingly, the Gracemont core doesn't handle the interpreter's hashtables very well (
vm_hash
), despite handling the slower (and more cumbersome) interpreted FFI handlers just fine (vm_ffi
). Both cores had slightly higher scores with an unroll factor of 8 compared to an unroll factor of 16.The baseline JIT workloads (
jit_slow
) are handled pretty well by both the Gracemont and Raptor cores. However, the plain fast-JIT workload (jit
) has the 4th-largest performance gap in the suite.The cache-optimized fast-JIT workloads are where we see each core start to shine (
jit_16_c...k
). The Gracemont core performs best when running blocks of 32kb (jit_16_c16k
), and gets to 71% of the Raptor core's performance at a smaller block size of 16kb (jit_16_c8k
). However, it drops off at 48kb (jit_16_c24k
) and performance begins to plummet as more and more L1 evictions occur.The raptor cores stay strong throughout all cache-optimized runs, continuing to net wins as the execution blocks grow. This may indicate better L1 eviction performance or an optimization in linear-access prefetching. (Raptor cove is known to have a significant improvement in prefetch heuristics compared to previous generations)
Top 8
Below are tables for just the top 8 single-threaded benchmarks for the Raptor cores and Gracemont cores. We can clearly see from the results that the Raptor cores handle the prime workload significantly better, at about 2x the operations/second compared to the Gracemont cores.
Top 8 - Gracemont (Efficiency)
Top 8 - Raptor (Performance)
Beta Was this translation helpful? Give feedback.
All reactions