Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor Optimization for the Generic Dominator Calculation Algorithm #1424

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

omerk2511
Copy link

@omerk2511 omerk2511 commented Mar 27, 2022

The generic dominator calculation algorithm sets the initial todo set to be just all the nodes. This leads to redundant calculations which can be saved.

Explanation:

The algorithm's main loop tries to see if the dominators of the current node have changed by calculating the intersection of every previous node's dominators. Since the initial dominator set of all nodes is always the entire node set, except for the head node (whose dominator set is set([head])), this intersection calculation will always initially return the entire node set for all nodes except the next nodes of the head node. Therefore, we can prevent this waste of cycles by setting the initial todo set to contain only them.

Note - I used the terms previous and next nodes instead of predecessors and successors, since this function supports both dominators and postdominators. I initially didn't notice the support for postdominators as well, and implemented an additional optimization which was correct only for dominator calculation. Luckily, the regression tests caught this.

Benchmarks:

When adding a log of the amount of such "misses" and running the graph.py test suite, these are the results before this PR:

# of misses: 1
# of misses: 5
# of misses: 4
# of misses: 2
# of misses: 3
# of misses: 1
# of misses: 0
# of misses: 5
# of misses: 1
# of misses: 0
# of misses: 5
# of misses: 3

And after:

# of misses: 1
# of misses: 0
# of misses: 1
# of misses: 0
# of misses: 0
# of misses: 1
# of misses: 0
# of misses: 0
# of misses: 1
# of misses: 0
# of misses: 0
# of misses: 2

Not so dramatic - but I guess it might be beneficial for analysis of huge CFGs... Let's test it for large CFGs as well. I used this tester program which creates a pretty linear CFG of 20K nodes:

from miasm.core.graph import *
from random import randint

for i in range(5):
    x = {}
    for a in range(1, 20001):
        x[a] = str(randint(1000, 10000))

    g = DiGraph()
    g.add_edge(x[1], x[2])
    g.add_edge(x[2], x[3])
    g.add_edge(x[2], x[4])
    g.add_edge(x[3], x[5])
    g.add_edge(x[4], x[5])
    g.add_edge(x[5], x[2])
    g.add_edge(x[2], x[6])
    for a in range(1, 20000):
        g.add_edge(x[a], x[a+1])

    dominators = g.compute_dominators(x[1])

Here are the results before this PR:

# of misses: 13471
# of misses: 13558
# of misses: 13658
# of misses: 13541
# of misses: 13652
python3 ./graph.py  95.40s user 4.41s system 99% cpu 1:39.82 total

And after:

# of misses: 5687
# of misses: 5633
# of misses: 5624
# of misses: 5602
# of misses: 5771
python3 ./graph.py  30.75s user 1.32s system 99% cpu 32.075 total

About 3 times faster, nice...

@omerk2511 omerk2511 changed the title Minor Generic Dominator Calculation Algorithm Optimization Minor Optimization for the Generic Dominator Calculation Algorithm Mar 27, 2022
… any node in the graph is correct, but this is a *generic* function, so it should handle postdominators as well, whose head can be a predecessor of some node in the graph. Thus, the node == head check is still required.
@serpilliere
Copy link
Contributor

Hey @omerk2511
I will analyze this in a near future: I have to think about it :)
Thank you for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants