Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster erdos_renyi #212

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

abraunst
Copy link

@abraunst abraunst commented Jan 7, 2023

See #150 (comment):

To compare I've used

for N in (500, 5000), is_directed in (true, false), p in (10/N, 0.5, 1-10/N)
    @btime erdos_renyi($N, $p; is_directed=$is_directed)
end

That gives

634.683 μs (2601 allocations: 387.77 KiB)
  15.538 ms (5023 allocations: 8.35 MiB)
  20.433 ms (5023 allocations: 9.27 MiB)
  407.949 μs (1285 allocations: 187.91 KiB)
  9.689 ms (2517 allocations: 4.18 MiB)
  14.804 ms (2517 allocations: 4.64 MiB)
  7.411 ms (26507 allocations: 4.08 MiB)
  1.679 s (70029 allocations: 622.30 MiB)
  2.491 s (80029 allocations: 1.35 GiB)
  4.633 ms (13244 allocations: 2.04 MiB)
  1.066 s (35020 allocations: 311.18 MiB)
  1.666 s (40020 allocations: 691.04 MiB)

which seems uniformly better than master:

1.230 ms (2637 allocations: 356.81 KiB)
  68.167 ms (5003 allocations: 7.38 MiB)
  47.454 ms (7675 allocations: 7.74 MiB)
  583.637 μs (1307 allocations: 174.48 KiB)
  30.522 ms (2502 allocations: 3.69 MiB)
  22.270 ms (3823 allocations: 3.86 MiB)
  15.997 ms (26913 allocations: 3.66 MiB)
  42.191 s (70005 allocations: 526.66 MiB)
  10.789 s (107180 allocations: 1.17 GiB)
  7.240 ms (13430 allocations: 1.82 MiB)
  18.297 s (35003 allocations: 263.33 MiB)
  4.420 s (53510 allocations: 597.51 MiB)

@codecov
Copy link

codecov bot commented Jan 7, 2023

Codecov Report

Merging #212 (1cace9f) into master (f6f8db6) will increase coverage by 0.14%.
The diff coverage is 100.00%.

❗ Current head 1cace9f differs from pull request most recent head 5e38794. Consider uploading reports for the commit 5e38794 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #212      +/-   ##
==========================================
+ Coverage   97.28%   97.43%   +0.14%     
==========================================
  Files         115      113       -2     
  Lines        6789     6543     -246     
==========================================
- Hits         6605     6375     -230     
+ Misses        184      168      -16     

@abraunst
Copy link
Author

abraunst commented Jan 7, 2023

It's a miracle that tests have passed because some of them depend on the specific random instance (that surely changed)...

@abraunst
Copy link
Author

Is this what you proposed in #150 (comment) for n,p graphs, @simonschoelly ?

@abraunst
Copy link
Author

abraunst commented May 10, 2023

Can I do something to make this more easily reviewable? The code essentially

  • (undirected case) samples a $p$-sparse binary vector of size $\binom{N}{2}$ and then maps it into the upper triangle of a $N\times N$ matrix
  • (directed case) samples a $p$-sparse binary vector of size $N(N-1)$ and then maps it into the non-diagonal elements of a $N\times N$ matrix

and then constructs the corresponding graph. The sampling of the binary vector is done by randsubseq which is by the way the function used for the same purpose by the SparseArrays standard library.

EDIT: and the speedup is somewhere between 2x and 18x in these tests...

@gdalle gdalle added the enhancement New feature or request label Jun 16, 2023
@gdalle
Copy link
Member

gdalle commented Jan 29, 2024

Sorry for the latency, and thanks for the contribution. @etiennedeg should we keep this one or #150?

@abraunst what kind of tests can guarantee correctness despite the randomness?

@etiennedeg
Copy link
Member

More likely this one. I can try to review it this week (probably not before Wednesday, but I should be able to find some time)

@abraunst
Copy link
Author

Sorry for the latency, and thanks for the contribution. @etiennedeg should we keep this one or #150?

@abraunst what kind of tests can guarantee correctness despite the randomness?

Maybe something like checking for average degree with a reasonable tolerance?

@etiennedeg
Copy link
Member

When I benchmark against the implementation proposed by @simonschoelly in #150, I get this (only for undirected graph because simon's code seems broken for directed graphs):

julia> for N in (500, 5000), is_directed in (false, ), p in (10/N, 0.5, 1-10/N)
           @btime erdos_renyi_ss($N, $p; is_directed=$is_directed)
       end
  164.086 μs (504 allocations: 77.45 KiB)
  2.410 ms (504 allocations: 1.02 MiB)
  2.538 ms (504 allocations: 1.95 MiB)
  1.927 ms (5007 allocations: 792.03 KiB)
  351.821 ms (10007 allocations: 95.79 MiB)
  435.251 ms (10007 allocations: 190.78 MiB)

julia> for N in (500, 5000), is_directed in (false, ), p in (10/N, 0.5, 1-10/N)
           @btime erdos_renyi_pr($N, $p; is_directed=$is_directed)
       end
  450.744 μs (1296 allocations: 191.47 KiB)
  10.534 ms (3017 allocations: 4.76 MiB)
  17.142 ms (3017 allocations: 5.69 MiB)
  5.269 ms (13225 allocations: 2.03 MiB)
  1.107 s (42520 allocations: 361.60 MiB)
  2.319 s (47520 allocations: 788.97 MiB)

I appreciate the conciseness of your code but simon's seems 4 times faster with 4 times less allocations.

Maybe something like checking for average degree with a reasonable tolerance?

That equivalent to check the number of edges, so it seems rather weak. Maybe something like degree distribution ?

@abraunst
Copy link
Author

I've rebased

@abraunst
Copy link
Author

abraunst commented Jan 31, 2024

When I benchmark against the implementation proposed by @simonschoelly in #150, I get this (only for undirected graph because simon's code seems broken for directed graphs):

julia> for N in (500, 5000), is_directed in (false, ), p in (10/N, 0.5, 1-10/N)
           @btime erdos_renyi_ss($N, $p; is_directed=$is_directed)
       end
  164.086 μs (504 allocations: 77.45 KiB)
  2.410 ms (504 allocations: 1.02 MiB)
  2.538 ms (504 allocations: 1.95 MiB)
  1.927 ms (5007 allocations: 792.03 KiB)
  351.821 ms (10007 allocations: 95.79 MiB)
  435.251 ms (10007 allocations: 190.78 MiB)

julia> for N in (500, 5000), is_directed in (false, ), p in (10/N, 0.5, 1-10/N)
           @btime erdos_renyi_pr($N, $p; is_directed=$is_directed)
       end
  450.744 μs (1296 allocations: 191.47 KiB)
  10.534 ms (3017 allocations: 4.76 MiB)
  17.142 ms (3017 allocations: 5.69 MiB)
  5.269 ms (13225 allocations: 2.03 MiB)
  1.107 s (42520 allocations: 361.60 MiB)
  2.319 s (47520 allocations: 788.97 MiB)

I appreciate the conciseness of your code but simon's seems 4 times faster with 4 times less allocations.

Maybe something like checking for average degree with a reasonable tolerance?

That equivalent to check the number of edges, so it seems rather weak. Maybe something like degree distribution ?

That looks rather unexpected to me. I'll have a better look at that.

EDIT: the problem most likely is that SimpleGraphFromIterator is slow because it cannot preallocate memory for the adjacency lists and just push!es, requiring three or four allocations per vertex instead of one.

@gdalle
Copy link
Member

gdalle commented Jan 31, 2024

EDIT: the problem most likely is that SimpleGraphFromIterator is slow because it cannot preallocate memory for the adjacency lists and just push!es, requiring three or four allocations per vertex instead of one.

Could we sizehint! it if the iterator has a length?

@abraunst
Copy link
Author

EDIT: the problem most likely is that SimpleGraphFromIterator is slow because it cannot preallocate memory for the adjacency lists and just push!es, requiring three or four allocations per vertex instead of one.

Could we sizehint! it if the iterator has a length?

seems difficult because these are per-vertex lists, whereas the iterator in question is on the full list of edges...

@etiennedeg
Copy link
Member

When removing the SimpleGraphFromIterator call (and replacing it by an empty graph), I get

  1.550 ms (4314 allocations: 174.41 KiB)
  25.122 ms (64000 allocations: 2.94 MiB)
  46.153 ms (124133 allocations: 5.70 MiB)
  16.399 ms (73717 allocations: 2.20 MiB)
  3.212 s (38215347 allocations: 774.12 MiB)

@abraunst
Copy link
Author

abraunst commented Jan 31, 2024

When removing the SimpleGraphFromIterator call (and replacing it by an empty graph), I get

  1.550 ms (4314 allocations: 174.41 KiB)
  25.122 ms (64000 allocations: 2.94 MiB)
  46.153 ms (124133 allocations: 5.70 MiB)
  16.399 ms (73717 allocations: 2.20 MiB)
  3.212 s (38215347 allocations: 774.12 MiB)

Are you sure? This seems too strange, The number of allocations should be about N. Cannot reproduce locally.

@etiennedeg
Copy link
Member

Weird, done it again, I got:

  107.043 μs (505 allocations: 57.03 KiB)
  2.767 ms (505 allocations: 532.59 KiB)
  2.753 ms (505 allocations: 1004.34 KiB)
  1.091 ms (5006 allocations: 553.33 KiB)
  300.378 ms (5006 allocations: 48.11 MiB)
  321.689 ms (5006 allocations: 95.64 MiB)

I don't know how I got the previous results

@etiennedeg
Copy link
Member

The biggest advantage of Simon's is that the neighbors are already sorted, so it is much less costly to build the adjacency lists. No matter what, with this approach, we are going to pay a cost to retrieve the adjacency lists.

@abraunst
Copy link
Author

The biggest advantage of Simon's is that the neighbors are already sorted, so it is much less costly to build the adjacency lists. No matter what, with this approach, we are going to pay a cost to retrieve the adjacency lists.

The order is the same (Simon's code actually copies randsubseq which gives a sorted result). A modified SimpleGraphFromIterator already improves things significanly:

63.205 μs (505 allocations: 95.14 KiB)
  1.880 ms (505 allocations: 1.50 MiB)
  2.563 ms (505 allocations: 2.89 MiB)
  668.423 μs (5007 allocations: 955.06 KiB)
  236.858 ms (10007 allocations: 143.50 MiB)
  307.496 ms (10007 allocations: 286.03 MiB)

against Simon's

39.659 μs (504 allocations: 77.61 KiB)
  849.891 μs (504 allocations: 1.02 MiB)
  721.155 μs (504 allocations: 1.95 MiB)
  432.917 μs (5007 allocations: 791.56 KiB)
  120.317 ms (10007 allocations: 95.76 MiB)
  113.731 ms (10007 allocations: 190.78 MiB)
function SimpleGraphFromIterator(edges, nv)
    deg = zeros(Int, nv)
    ne = 0
    @inbounds for e in edges
        u, v = e.src, e.dst
        @assert (u,v)  eachindex(deg)
        deg[u] += 1
        deg[v] += 1
        ne += 1
    end

    fadjlist = zeros.(Int, deg)
    deg .= 0

    @inbounds for e in edges
        u, v = e.src, e.dst
        fadjlist[u][deg[u]+=1] = v
        fadjlist[v][deg[v]+=1] = u
    end
    return SimpleGraph(ne, fadjlist)
end
```

@etiennedeg
Copy link
Member

etiennedeg commented Jan 31, 2024

Simon's, but replacing the call to _sample_with_replacement! by randsubseq!, so does not improve.

  217.053 μs (504 allocations: 77.73 KiB)
  3.721 ms (504 allocations: 1.02 MiB)
  4.865 ms (535 allocations: 2.00 MiB)
  2.517 ms (5023 allocations: 833.01 KiB)
  493.546 ms (10052 allocations: 95.83 MiB)
  642.595 ms (10055 allocations: 190.89 MiB)

@abraunst
Copy link
Author

abraunst commented Jan 31, 2024

Simon's, but replacing the call to _sample_with_replacement! by randsubseq!, so does not improve.

  217.053 μs (504 allocations: 77.73 KiB)
  3.721 ms (504 allocations: 1.02 MiB)
  4.865 ms (535 allocations: 2.00 MiB)
  2.517 ms (5023 allocations: 833.01 KiB)
  493.546 ms (10052 allocations: 95.83 MiB)
  642.595 ms (10055 allocations: 190.89 MiB)

amazing... these are almost the same code!

EDIT: the cause is probably slowness in push! which supposedly will become fast in 1.11

@etiennedeg
Copy link
Member

etiennedeg commented Jan 31, 2024

There is a gotcha in your implementation, the adjacency lists are not sorted... (This is a requirement for SimpleGraph). So you indeed have to pay more to sort the adjacency lists.

EDIT: If we change trianglemap to not use modulo offset by 1, then the edges are returned in an order similar to simon's.

@abraunst
Copy link
Author

There is a gotcha in your implementation, the adjacency lists are not sorted... (This is a requirement for SimpleGraph). So you indeed have to pay more to sort the adjacency lists.

EDIT: If we change trianglemap to not use modulo offset by 1, then the edges are returned in an order similar to simon's.

oh I see. Indeed, this can be easily fixed. Thanks for all the feedback! I will try to come up with something cleaner in the next few days, but otherwise we can use some versione of Simon's code. 😄

@etiennedeg
Copy link
Member

Well, in fact it was not only the mod1 that caused vertices to not be sorted. Using your old definition of trianglemap (which follows simon's ordering), I was able to cook something, but it is not quite as fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants