-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster erdos_renyi #212
base: master
Are you sure you want to change the base?
faster erdos_renyi #212
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #212 +/- ##
==========================================
+ Coverage 97.28% 97.43% +0.14%
==========================================
Files 115 113 -2
Lines 6789 6543 -246
==========================================
- Hits 6605 6375 -230
+ Misses 184 168 -16 |
It's a miracle that tests have passed because some of them depend on the specific random instance (that surely changed)... |
Is this what you proposed in #150 (comment) for n,p graphs, @simonschoelly ? |
Can I do something to make this more easily reviewable? The code essentially
and then constructs the corresponding graph. The sampling of the binary vector is done by EDIT: and the speedup is somewhere between 2x and 18x in these tests... |
Sorry for the latency, and thanks for the contribution. @etiennedeg should we keep this one or #150? @abraunst what kind of tests can guarantee correctness despite the randomness? |
More likely this one. I can try to review it this week (probably not before Wednesday, but I should be able to find some time) |
Maybe something like checking for average degree with a reasonable tolerance? |
When I benchmark against the implementation proposed by @simonschoelly in #150, I get this (only for undirected graph because simon's code seems broken for directed graphs): julia> for N in (500, 5000), is_directed in (false, ), p in (10/N, 0.5, 1-10/N)
@btime erdos_renyi_ss($N, $p; is_directed=$is_directed)
end
164.086 μs (504 allocations: 77.45 KiB)
2.410 ms (504 allocations: 1.02 MiB)
2.538 ms (504 allocations: 1.95 MiB)
1.927 ms (5007 allocations: 792.03 KiB)
351.821 ms (10007 allocations: 95.79 MiB)
435.251 ms (10007 allocations: 190.78 MiB)
julia> for N in (500, 5000), is_directed in (false, ), p in (10/N, 0.5, 1-10/N)
@btime erdos_renyi_pr($N, $p; is_directed=$is_directed)
end
450.744 μs (1296 allocations: 191.47 KiB)
10.534 ms (3017 allocations: 4.76 MiB)
17.142 ms (3017 allocations: 5.69 MiB)
5.269 ms (13225 allocations: 2.03 MiB)
1.107 s (42520 allocations: 361.60 MiB)
2.319 s (47520 allocations: 788.97 MiB) I appreciate the conciseness of your code but simon's seems 4 times faster with 4 times less allocations.
That equivalent to check the number of edges, so it seems rather weak. Maybe something like degree distribution ? |
I've rebased |
That looks rather unexpected to me. I'll have a better look at that. EDIT: the problem most likely is that |
Could we |
seems difficult because these are per-vertex lists, whereas the iterator in question is on the full list of edges... |
When removing the
|
Are you sure? This seems too strange, The number of allocations should be about N. Cannot reproduce locally. |
Weird, done it again, I got:
I don't know how I got the previous results |
The biggest advantage of Simon's is that the neighbors are already sorted, so it is much less costly to build the adjacency lists. No matter what, with this approach, we are going to pay a cost to retrieve the adjacency lists. |
The order is the same (Simon's code actually copies
against Simon's
function SimpleGraphFromIterator(edges, nv)
deg = zeros(Int, nv)
ne = 0
@inbounds for e in edges
u, v = e.src, e.dst
@assert (u,v) ⊂ eachindex(deg)
deg[u] += 1
deg[v] += 1
ne += 1
end
fadjlist = zeros.(Int, deg)
deg .= 0
@inbounds for e in edges
u, v = e.src, e.dst
fadjlist[u][deg[u]+=1] = v
fadjlist[v][deg[v]+=1] = u
end
return SimpleGraph(ne, fadjlist)
end
``` |
Simon's, but replacing the call to
|
amazing... these are almost the same code! EDIT: the cause is probably slowness in |
There is a gotcha in your implementation, the adjacency lists are not sorted... (This is a requirement for EDIT: If we change trianglemap to not use modulo offset by 1, then the edges are returned in an order similar to simon's. |
oh I see. Indeed, this can be easily fixed. Thanks for all the feedback! I will try to come up with something cleaner in the next few days, but otherwise we can use some versione of Simon's code. 😄 |
Well, in fact it was not only the mod1 that caused vertices to not be sorted. Using your old definition of trianglemap (which follows simon's ordering), I was able to cook something, but it is not quite as fast. |
See #150 (comment):
To compare I've used
That gives
which seems uniformly better than master: