feat(tui): add tokens per second to response footer#12721
feat(tui): add tokens per second to response footer#12721JohnC0de wants to merge 3 commits intoanomalyco:devfrom
Conversation
|
The following comment was made by an LLM, it may be inaccurate: Potential Duplicate Found: PR #5497 - "feat: display tokens per second for assistant messages" Why it's related: This PR appears to be addressing the exact same feature - displaying tokens per second for assistant messages. It likely covers similar functionality for tracking and displaying TPS metrics in the UI. |
787aee0 to
c54f23a
Compare
Adds TPS calculation and display to message footers. Tracks firstToken timestamp during streaming and calculates throughput for completed text responses. Filters out tool calls and fast responses to avoid noise. Key features: - Shows TPS next to duration: "3.4s · 45 tok/s" - Includes both output and reasoning tokens - 250ms minimum threshold to filter noise - Comprehensive test coverage (34 tests) Tested with Kimi K2.5 showing ~131 tok/s. Fixes anomalyco#5374, Closes anomalyco#6096
c54f23a to
571c49b
Compare
|
@adamdotdevin @rekram1-node — the bot flagged this as a duplicate of #5497, so wanted to give some context. I reviewed #5497 before starting. It has merge conflicts against Quick review guide if it helps:
Happy to adjust anything. |
|
Any update on this? |
|
@rekram1-node I investigated the 3 failing checks on this PR. Root cause:
Proposed minimal fix (single-file change):
I can paste the exact patch here if useful. |
|
@rekram1-node I opened a follow-up PR that includes all changes from this PR plus a minimal ripgrep path fix for the failing checks: Cross-reference: |
Fixes #5374
Closes #6096
Adds a tok/s (TPS) counter to assistant message footers. Shows up right after duration, like:
18.3s · 131 tok/sWhy
I've been switching between providers a lot lately and wanted a quick way to see which models are actually fast vs which just feel fast. Kimi K2.5 clocks ~130 tok/s. Having the number right there makes the difference obvious without needing external tooling.
Screenshot
Kimi K2.5 Free hitting 198 tok/s on a real response
Prior art
#5497 by @edlsh tackled this back in December. It's been sitting for 2+ months now with merge conflicts and CI failures, and a few people in the comments are asking for it to land. Rather than try to rebase that PR, I reimplemented it cleanly on current
devwith a different structure: TPS logic lives incore/tokens/instead oftui/util/so the SDK and other consumers can use it later without pulling in TUI code.How it works
processor.tsrecords afirstTokentimestamp when the firstoutput-deltaarrives during streaming. TPS is then calculated asgeneratedTokens / ((completed - firstToken) / 1000), where generatedTokens includes both output and reasoning tokens. Responses shorter than 250ms, tool calls, and errored responses are filtered out.What I left out
Average/aggregate TPS across a session. Both issues mention it but it felt like scope creep for a first pass. The per-message timestamps are all persisted, so adding a session-level summary later is straightforward.
Testing
34 unit tests cover calculation, edge cases, and filtering. All CI checks pass: typecheck, unit, e2e (linux), pr-standards.