Problem
It is hard to pinpoint performance regressions across versions.
Proposal
Add optional profiling outputs (per stage timings, per queue CPU usage, allocator stats), behind a flag.
Alternatives considered
External profilers only.
Additional context
Useful for performance tuning and regression tracking.