-
Notifications
You must be signed in to change notification settings - Fork 501
Simple but significant speed improvement in textprinter.cpp #1613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a performance optimization to the textprinter plugin by adding an early PC (program counter) check before the expensive get_prog_point call. The optimization uses an unordered_set to quickly filter callbacks where the PC is not in the tap points, resulting in significant speedup (2.5-3x improvement based on the benchmark data).
Key changes:
- Added fast-path PC lookup using
unordered_set<target_ulong>for O(1) filtering - Modified
mem_callbackto check PC membership before expensiveget_prog_pointoperation - Populated the PC set alongside existing tap_points in
init_plugin
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
| mem_counter++; | ||
| return; | ||
| } |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mem_counter is incremented twice for tap point matches: once in the early return path (line 64 for non-matches) and once at line 87 (after processing matches). This creates an inconsistency where non-matching PCs increment the counter at the start of the function, while matching PCs increment it at the end. The increment on line 64 should be removed to maintain the original behavior where mem_counter is always incremented at line 87, ensuring consistent counting regardless of whether the PC matches.
| mem_counter++; | |
| return; | |
| } | |
| return; | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we only increase the counter before a return from the function there is no way the counter would increase twice for a single call. But a cleaner solution might be to move the counter increase to the top of the function instead of having it in two locations.
|
I think this is a reasonable optimization. Ultimately, the comparison: panda/panda/plugins/callstack_instr/prog_point.h Lines 52 to 61 in 840acd9
will be applied, but given that PC must match one of them (and this is a significant differentiator) this should help. The copilot suggestion about the count is worthwhile, but otherwise this looks good. |
|
As we only increase the counter before a return from the function there is no way the counter would increase twice for a single call. But a cleaner solution might be to move the counter increase to the top of the function instead of having it in two locations. Something like this: |
|
Is there any action required from my side on this or will this be merged in due time? |
|
You had suggested a cleaner approach in the comment above. I wasn't sure if you were you going to implement that in the PR. |
Refactor memory callback to avoid duplicate code and improve readability.
|
I understand i should have made it a question. I was unsure if a refactoring change combined with a new-functionality change was welcome. Pull request updated. |
Currently the textprinter plugin does a full
get_prog_pointfor all callbacks.However a simple speedup is to verify that any tap point has the current PC before continuing.
For my test case this gave a very big improvement as you can see below.
Without optimization:
With opt: