Skip to content

Conversation

@fredfrance-oss
Copy link

Currently the textprinter plugin does a full get_prog_point for all callbacks.
However a simple speedup is to verify that any tap point has the current PC before continuing.

For my test case this gave a very big improvement as you can see below.

Without optimization:

        test.rec:  2535950507 (  1.00%) instrs. 1422.76 sec.  2.06 GB ram.
        test.rec:  5071900973 (  2.00%) instrs. 2593.45 sec.  2.59 GB ram.
        test.rec:  7607851447 (  3.00%) instrs. 4124.41 sec.  2.80 GB ram.

With opt:

        test.rec:  2535950507 (  1.00%) instrs.  528.39 sec.  1.96 GB ram.
        test.rec:  5071900973 (  2.00%) instrs.  922.36 sec.  2.14 GB ram.
        test.rec:  7607851447 (  3.00%) instrs. 1660.56 sec.  2.39 GB ram.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a performance optimization to the textprinter plugin by adding an early PC (program counter) check before the expensive get_prog_point call. The optimization uses an unordered_set to quickly filter callbacks where the PC is not in the tap points, resulting in significant speedup (2.5-3x improvement based on the benchmark data).

Key changes:

  • Added fast-path PC lookup using unordered_set<target_ulong> for O(1) filtering
  • Modified mem_callback to check PC membership before expensive get_prog_point operation
  • Populated the PC set alongside existing tap_points in init_plugin

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Comment on lines 64 to 66
mem_counter++;
return;
}
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mem_counter is incremented twice for tap point matches: once in the early return path (line 64 for non-matches) and once at line 87 (after processing matches). This creates an inconsistency where non-matching PCs increment the counter at the start of the function, while matching PCs increment it at the end. The increment on line 64 should be removed to maintain the original behavior where mem_counter is always incremented at line 87, ensuring consistent counting regardless of whether the PC matches.

Suggested change
mem_counter++;
return;
}
return;
}

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we only increase the counter before a return from the function there is no way the counter would increase twice for a single call. But a cleaner solution might be to move the counter increase to the top of the function instead of having it in two locations.

@lacraig2
Copy link
Member

I think this is a reasonable optimization. Ultimately, the comparison:

bool operator ==(const prog_point &p) const {
bool sids_match = true;
if ((this->sidFirst != p.sidFirst) ||
(this->sidSecond != p.sidSecond) ||
(this->isKernelMode != p.isKernelMode) ||
(this->stackKind != p.stackKind)) {
sids_match = false;
}
return ((this->pc == p.pc) && (this->caller == p.caller) && sids_match);
}

will be applied, but given that PC must match one of them (and this is a significant differentiator) this should help.

The copilot suggestion about the count is worthwhile, but otherwise this looks good.

@fredfrance-oss
Copy link
Author

fredfrance-oss commented Nov 25, 2025

As we only increase the counter before a return from the function there is no way the counter would increase twice for a single call. But a cleaner solution might be to move the counter increase to the top of the function instead of having it in two locations.

Something like this:

void mem_callback(CPUState *env, target_ulong pc, target_ulong addr,
                  size_t size, uint8_t *buf, gzFile f) {
    mem_counter++;
    if (tap_pc.find(pc) == tap_pc.end()) {
        return;
    }
    prog_point p = {};
    get_prog_point(env, &p);
    if (tap_points.find(p) == tap_points.end()) {
        return;
    }
    
    target_ulong callers[16] = {0};
    int nret = get_callers(callers, 16, env);
    unsigned char *buf_uc = static_cast<unsigned char *>(buf);
    for (unsigned int i = 0; i < size; i++) {
        for (int j = nret-1; j > 0; j--) {
            gzprintf(f, TARGET_FMT_lx " ", callers[j]);
        }
        gzprintf(f,
                 TARGET_FMT_lx " " TARGET_FMT_lx " %d " TARGET_FMT_lx
                               " " TARGET_FMT_lx " %s " TARGET_FMT_lx
                               " %ld %02x\n",
                 p.caller, p.pc, p.stackKind, p.sidFirst, p.sidSecond,
                 p.isKernelMode ? "kernel" : "user", addr + i, mem_counter,
                 buf_uc[i]);
    }
    
    return;
}

@fredfrance-oss
Copy link
Author

Is there any action required from my side on this or will this be merged in due time?
(Same question for the other PR)

@lacraig2
Copy link
Member

lacraig2 commented Dec 4, 2025

You had suggested a cleaner approach in the comment above. I wasn't sure if you were you going to implement that in the PR.

Refactor memory callback to avoid duplicate code and improve readability.
@fredfrance-oss
Copy link
Author

fredfrance-oss commented Dec 8, 2025

I understand i should have made it a question. I was unsure if a refactoring change combined with a new-functionality change was welcome.

Pull request updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants