Skip to content
This repository has been archived by the owner on Oct 8, 2021. It is now read-only.

Reduced time & memory footprint for Tarjans algorithm, fixed a bug where it was O(E^2) on star graphs. #1559

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

saolof
Copy link

@saolof saolof commented Apr 11, 2021

I worked on improving the performance of the strongly_connected_components function, and on fixing #1560 .

This PR eliminates several of the arrays that were used for auxiliary data using various tricks that were directly inspired by David J. Pearce's preprint over at https://homepages.ecs.vuw.ac.nz/~djp/files/IPL15-preprint.pdf , which reduces memory usage & allocations and improves cache efficiency. In particular, this lead to a significant speedup for dense random graphs in benchmarking. I would also subjectively claim that this version of the algorithm is more easily readable than Tarjan's original version.

In addition to this, it adds a new stack to keep track of the iteration state for nodes with a large number of outneighbours, specifically to fix #1560 . Interestingly, benchmarking revealed that the size threshold above which saving the iteration state is worth doing is surprisingly large (on the order of a thousand outneighbours), though of course that still applies to many real-world graphs.

I also put up a gist that anyone can use for some quick benchmarks:
https://gist.github.com/saolof/7b5d26a41e6a34ff2b3e76d3fc5541da

Furthermore, since it also computes a component table at the same time, adding a flag to return the component table would allow saving some work in other functions that compute one for the SCCs such as https://github.com/JuliaGraphs/LightGraphs.jl/blob/2a644c2b15b444e7f32f73021ec276aa9fc8ba30/src/connectivity.jl#L541 and places where that is called such as when computing transitive closures.

saolof added 2 commits April 11, 2021 19:45
I worked on improving the strongly_connected_components function in this graph. 

This PR eliminates several of the arrays that were used for auxiliary data using various tricks inspired by https://homepages.ecs.vuw.ac.nz/~djp/files/IPL15-preprint.pdf , which reduces memory usage & allocations and improves cache efficiency. This seems to lead to a speedup in every category of random graphs in benchmarking. I would also subjectively claim that this version of the algorithm is more easily readable than Tarjan's original version.

Also put up a gist of a few alternatives I tried out:
https://gist.github.com/saolof/7b5d26a41e6a34ff2b3e76d3fc5541da
spelled function name correctly
Copy link
Author

@saolof saolof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Forgot to change the name of the method back to strongly_connected_components after working on it on my own machine with lightgraphs loaded.

@codecov
Copy link

codecov bot commented Apr 12, 2021

Codecov Report

Merging #1559 (09683df) into master (5d2b80a) will decrease coverage by 0.12%.
The diff coverage is 88.33%.

❗ Current head 09683df differs from pull request most recent head ebb392b. Consider uploading reports for the commit ebb392b to get more accurate results

@@            Coverage Diff             @@
##           master    #1559      +/-   ##
==========================================
- Coverage   99.44%   99.31%   -0.13%     
==========================================
  Files         106      106              
  Lines        5551     5568      +17     
==========================================
+ Hits         5520     5530      +10     
- Misses         31       38       +7     

Counting downwards instead of upwards has the advantage that rindex becomes a lookup table for components, if we ever decide to return both. Also makes the algorithm invariant crystal clear.
Copy link
Author

@saolof saolof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it so that the visitation count is downwards and component_count is upwards. This has the advantage that v is in components[rindex[v]] (so rindex is a lookup table for components) if we decide to add a flag to return rindex. Also makes the algorithm invariant clearer.

@simonschoelly
Copy link
Contributor

Hi, thanks for your contribution - It's been a few years since I last looked at Tarjans algorithm, so I will first have to figure out again how it works to do some complete code review. In the meantime, could you add the code you used for benchmarking and the results that you got as a comment to this PR?

is_unvisited(data::AbstractVector,v::Integer) = iszero(data[v])
is_unvisited(data::Dict,v) = !haskey(data,v)

@traitfn function strongly_connected_components(g::AG::IsDirected) where {T, AG <: AbstractGraph{T}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't hurt that you removed the condition that T is a subtype of Integer but currently the AbstractGraph interface does not allow any other eltype than some Integer (and they are consecutive from 1 to nv(g)), so you probably do not need to use a Dict.

Copy link
Author

@saolof saolof Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The dict is only ever used as a fallback if T is not an Integer (for example, if someone abused the interface by inheriting from AbstractGraph with T = Symbol), otherwise the code still uses vectors by default and does assume that the nodes are consecutive. The extra lines with the fallbacks are optional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I that case, I would add the {T <: Integer} back - T don't think we should handle the case that the interface is incorrectly implemented as it is not clear, where something could go wrong.

component_count = 1 # Index of the current component being discovered.
# Invariant 1: count is always smaller than component_count.
# Invariant 2: if rindex[v] < component_count, then v is in components[rindex[v]].
# This trivially lets us tell if a node belongs to a previously discovered scc without any extra bits, just inequalities.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# This trivially lets us tell if a node belongs to a previously discovered scc without any extra bits, just inequalities.
# This trivially lets us tell if a vertex belongs to a previously discovered scc without any extra bits, just inequalities.

For consistency we should always use the term vertex, although they are synonyms in the literatore

is_unvisited(data::AbstractVector,v::Integer) = iszero(data[v])
is_unvisited(data::Dict,v) = !haskey(data,v)

@traitfn function strongly_connected_components(g::AG::IsDirected) where {T, AG <: AbstractGraph{T}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention some paper where you code your algorithm from. Lets add a reference to that paper to the docstring of that function.

src/connectivity.jl Outdated Show resolved Hide resolved
# Invariant 2: if rindex[v] < component_count, then v is in components[rindex[v]].
# This trivially lets us tell if a node belongs to a previously discovered scc without any extra bits, just inequalities.

component_root = empty_graph_data(Bool,g)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
component_root = empty_graph_data(Bool,g)
is_component_root = empty_graph_data(Bool,g)

a Dict of type Dict{T, Bool} can usually be replaced by a Set{T}. In this case, you could also think about using a Vector{Bool} or a BitVector} although with these you will waste some space in case we only have a few strongly connected components.

Copy link
Author

@saolof saolof Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's using a Vector{Bool} for now whenever T is an Integer. I'm planning to do some benchmarking with bitvector once I have a decent solution to make the loop nonquadratic.

The main issue with using anything that doesn't give you a pointer to a bool is that the boolean gets manipulated in a tight loop which is fast because of branch prediction/speculative execution. But of course adding a variable outside the loop and then storing its result is an option

@@ -252,66 +253,63 @@ function strongly_connected_components end
v = dfs_stack[end] #end is the most recently added item
u = zero_t
@inbounds for v_neighbor in outneighbors(g, v)
if index[v_neighbor] == zero_t
# unvisited neighbor found
if is_unvisited(rindex,v_neighbor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if is_unvisited(rindex,v_neighbor)
if is_unvisited(rindex, v_neighbor)

in general, would suggest to add a space after a comma in arguments lists for better readability and consistency with the other code.

@saolof
Copy link
Author

saolof commented Apr 12, 2021

Okay, here's some benchmark results. The first one is the previous library function. The second is what I currently have in this PR. Third one saves the iteration state when vertices become large enough, at the cost of a few extra operations for small vertices. Should I commit the third version since it is still quite competitive but solves the Star graph problem?


testing function strongly_connected_components:
path_digraph(100)
Trial(11.700 μs)
random_regular_digraph(500,5)
Trial(35.200 μs)
 random_tournament_digraph(200)
Trial(88.000 μs)
 random_orientation_dag(complete_graph(100))
Trial(18.500 μs)
 star_digraph(10000)
Trial(31.387 ms)

testing function strongly_connected_components_downward:
path_digraph(100)
Trial(6.360 μs)
random_regular_digraph(500,5)
Trial(32.200 μs)
 random_tournament_digraph(200)
Trial(40.100 μs)
 random_orientation_dag(complete_graph(100))
Trial(14.500 μs)
 star_digraph(10000)
Trial(42.072 ms)

testing function strongly_connected_components_2:
path_digraph(100)
Trial(7.075 μs)
random_regular_digraph(500,5)
Trial(34.600 μs)
 random_tournament_digraph(200)
Trial(38.400 μs)
 random_orientation_dag(complete_graph(100))
Trial(15.600 μs)
 star_digraph(10000)
Trial(778.800 μs)


@saolof
Copy link
Author

saolof commented Apr 12, 2021

Ah, I hadn't reenabled @inbounds in the third version. Here's some benchmarks with slightly bigger inputs and @inbounds enabled:

testing function strongly_connected_components:
path_digraph(1000)
Trial(110.500 μs)
random_regular_digraph(5000,8)
Trial(537.400 μs)
 random_tournament_digraph(1000)
Trial(2.216 ms)
 random_orientation_dag(complete_graph(1000))
Trial(918.600 μs)
 star_digraph(10000)
Trial(30.834 ms)

testing function strongly_connected_components_downward:
path_digraph(1000)
Trial(61.500 μs)
random_regular_digraph(5000,8)
Trial(517.800 μs)
 random_tournament_digraph(1000)
Trial(815.900 μs)
 random_orientation_dag(complete_graph(1000))
Trial(894.300 μs)
 star_digraph(10000)
Trial(33.027 ms)

testing function strongly_connected_components_2:
path_digraph(1000)
Trial(65.500 μs)
random_regular_digraph(5000,8)
Trial(532.900 μs)
 random_tournament_digraph(1000)
Trial(834.200 μs)
 random_orientation_dag(complete_graph(1000))
Trial(909.300 μs)
 star_digraph(10000)
Trial(756.500 μs)

Fixed O(|E|^2) performance bug that used to be an issue for star graphs. 

Minimal change in performance for large random graphs, but significant speedup for graphs that have both lots of SCCs and high node orders.
Copy link
Author

@saolof saolof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committed the version that fixes star graph performance. The algorithm should now be provably O(|V| + |E|) for all graphs. Also included the changes raised during the previous review, and added a general description of the algorithm at the top.

saolof and others added 2 commits April 12, 2021 17:09
Co-authored-by: Simon Schoelly <[email protected]>
Previous commit removed the if in front of iszero somehow
Copy link
Author

@saolof saolof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous commit replacing the u== zero_t with iszero(u) removed the if somehow

Slightly simplified logic and removed the need for zero_t.
Copy link
Author

@saolof saolof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly simplified the logic and removed the need for zero_t.

Trying to figure out what broke. Can elements of outneighbours be equal to nothing?
Copy link
Author

@saolof saolof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debugging latest commit.

Copy link
Author

@saolof saolof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing

Set correct name on method
@saolof
Copy link
Author

saolof commented Apr 13, 2021

Okay, seems fixed. The only non-passing test in the last run is the diffusion simulation at https://github.com/JuliaGraphs/LightGraphs.jl/blob/master/test/traversals/diffusion.jl#L155 which seems to give false negatives occasionally according to comment.

Added comments.

# Required to prevent quadratic time in star graphs without causing type instability. Returns the type of the state object returned by iterate that may be saved to a stack.
neighbor_iter_statetype(::Type{AG}) where {AG <: AbstractGraph} = Any # Analogous to eltype, but for the state of the iterator rather than the elements.
neighbor_iter_statetype(::Type{AG}) where {AG <: LightGraphs.SimpleGraphs.AbstractSimpleGraph} = Int # Since outneighbours is an array.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be done more cleanly by making this a @traitfn and adding a trait like say RandomAccessNeighbours to denote that the outnodes can be accessed randomly and that state when iterate()-ing over them is an int.

We could also add a layer of dispatch to strongly connected components so we can feed the graph to the function that infers the state type, and let it attempt to iterate on the first node to get the correct type.

saolof added 2 commits April 13, 2021 13:12
Added a dispatch to infer the types.
Copy link
Author

@saolof saolof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the fallback more generic by using Base.Iterators.approx_iter_type().

@saolof saolof changed the title Reduced time & memory footprint for Tarjans algorithm Reduced time & memory footprint for Tarjans algorithm, fixed a bug where it was O(E^2) on star graphs. Apr 13, 2021
@saolof
Copy link
Author

saolof commented Apr 14, 2021

https://gist.github.com/saolof/7b5d26a41e6a34ff2b3e76d3fc5541da
Updated Gist with benchmarks. Here's some quick results for the latest version in the current commit vs the once currently in master, after tuning the large node cutoff parameter to ~1024 after a first round of benchmarks:

testing function strongly_connected_components:
path_digraph(10000)
  4.615 ms (20049 allocations: 2.21 MiB)
random_regular_digraph(50000,3)
  33.385 ms (5983 allocations: 4.12 MiB)
random_regular_digraph(50000,8)
  38.742 ms (99 allocations: 4.20 MiB)
random_regular_digraph(50000,200)
  92.663 ms (60 allocations: 4.19 MiB)
random_regular_digraph(50000,2000)
  515.990 ms (60 allocations: 4.19 MiB)
 random_tournament_digraph(10000)
  232.991 ms (51 allocations: 1014.59 KiB)
 random_orientation_dag(complete_graph(10000))
  69.689 ms (20031 allocations: 1.71 MiB)
 star_digraph(100000)
  3.515 s (200029 allocations: 16.59 MiB)
 ------------------------------
testing function strongly_connected_components_2:
path_digraph(10000)
  2.926 ms (10033 allocations: 1.50 MiB)
random_regular_digraph(50000,3)
  36.592 ms (3022 allocations: 3.27 MiB)
random_regular_digraph(50000,8)
  41.791 ms (77 allocations: 3.43 MiB)
random_regular_digraph(50000,200)
  78.061 ms (56 allocations: 3.43 MiB)
random_regular_digraph(50000,2000)
  207.933 ms (71 allocations: 5.43 MiB)
 random_tournament_digraph(10000)
  75.131 ms (61 allocations: 1.34 MiB)
 random_orientation_dag(complete_graph(10000))
  75.588 ms (10028 allocations: 1.25 MiB)
 star_digraph(100000)
  13.889 ms (100026 allocations: 12.01 MiB)

In general, the speedup seems to be particularly big whenever the weak connectivity of the digraph is large (i.e. depending on width vs height of the DFS forest).

@saolof
Copy link
Author

saolof commented Apr 14, 2021

Some more benchmarks with more test cases (updated gist with them):

testing function strongly_connected_components:
path_digraph(10000)
  3.778 ms (20049 allocations: 2.21 MiB)
random_regular_digraph(50000,3)
  29.751 ms (6063 allocations: 4.12 MiB)
random_regular_digraph(50000,8)
  34.748 ms (84 allocations: 4.20 MiB)
random_regular_digraph(50000,200)     
  88.052 ms (60 allocations: 4.19 MiB)
random_orientation_dag(random_regular_graph(50000,200))
  46.974 ms (100034 allocations: 8.30 MiB)
random_regular_digraph(50000,2000)
  516.017 ms (60 allocations: 4.19 MiB)
random_tournament_digraph(10000)
  239.401 ms (51 allocations: 1014.59 KiB)
random_orientation_dag(complete_graph(10000))
  69.948 ms (20031 allocations: 1.71 MiB)
star_digraph(100000)
  3.585 s (200029 allocations: 16.59 MiB)
 ------------------------------
testing function strongly_connected_components_2:
path_digraph(10000)
  2.274 ms (10033 allocations: 1.50 MiB)
random_regular_digraph(50000,3)
  32.453 ms (3062 allocations: 3.27 MiB)
random_regular_digraph(50000,8)
  37.415 ms (69 allocations: 3.43 MiB)
random_regular_digraph(50000,200)
  74.465 ms (56 allocations: 3.43 MiB)
random_orientation_dag(random_regular_graph(50000,200))
  41.914 ms (50027 allocations: 6.01 MiB)
random_regular_digraph(50000,2000)
  208.503 ms (71 allocations: 5.43 MiB)
random_tournament_digraph(10000)
  71.482 ms (61 allocations: 1.34 MiB)
random_orientation_dag(complete_graph(10000))
  71.993 ms (10028 allocations: 1.25 MiB)
star_digraph(100000)
  11.344 ms (100026 allocations: 12.01 MiB)

saolof added 2 commits April 14, 2021 12:01
Changed everything to be T-valued. Partly to save space for small graphs represented using smaller types, and partly for correctness on machines where Int = Int32 and where the graph is large enough to require Int64s
@sbromberger
Copy link
Owner

I am very ok with this as long as it passes @simonschoelly 's muster. Thank you.

@simonschoelly
Copy link
Contributor

I stopped reviewing this PR, as you where pushing quite a lot of commits, so I wanted to wait until you get that done. I will try to continue the review then this week.

@sbromberger
Copy link
Owner

Hi all,

Thanks for the PR and for the comprehensive review! I'm tracking - let me know when it's ready for a final review and merge.

@sbromberger
Copy link
Owner

Hi all,

Just wanted to ping on this. Is it ready for final review?

@saolof
Copy link
Author

saolof commented Jul 6, 2021

Yes.

Any further additions I may want to suggest (possibly including making it stable by doing the output in the exterior loop instead of the interior one, or adding some flag to return the rootindex array) would be in a separate PR. This is just a performance improvement that strictly avoids making any user-visible changes other than improving performance in the expensive cases, while enabling later PR's by ensuring that the data structures used internally are also useful to return.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

strongly_connected_components and strongly_connected_components_kosaraju run in quadratic time.
3 participants