Internal Design of Strongly Connected Component¶
Overview¶
In the mathematical theory of directed graphs, a graph is said to be strongly connected if every vertex is reachable from every other vertex. The strongly connected components of an arbitrary directed graph form a partition into subgraphs that are themselves strongly connected. (from wikipedia) The API will compute the strongly connected component (SCC) of each vertex and return a graph with the vertex value containing the lowest vertex id in the SCC containing that vertex.
Algorithm¶
The colorbased algorithm of strongly connected components is given as following:
procedure StronglyConnectedComponent(graph_CSR)
graph_CSC := csr2csc(graph_CSR)
for each vertex v in graph
result(v) := 1
color(v) := 1
while all vertexs have been labeled
FWcoloring(graph_CSR, FWQueue, color)
BWlabeling(graph_CSR, graph_CSC, BWQueue, color, result)
end while
return result
sub procedure FWcoloring(graph_CSR, FWQueue, color)
rootNode := findNewRoot(FWQueue, color)
push rootNode into FWQueue
while FWQueue is not empty
u := pop FWQueue
for each edge(u, v) in graph_CSR
if color(v) == 1 then
color(v) = u
push v into FWQueue
end if
end for
end while
sub procedure BWlabeling(graph_CSR, graph_CSC, FWQueue, BWQueue, color, result)
newRootNode := rootNode
result(rootNode) := rootNode
for v in this color region
if indegree(v) == 0 or outdegree(v) == 0 then
result(v) := v
end if
end for
while all vertexs in this color region have been labeled
push newRootNode into BWQueue
while BWQueue is not empty
u := pop BWQueue
for each edge(u, v) in graph_CSC
if color(v) == newRootNode and result(v) == 1 then
result(v) = newRootNode
push v into BWQueue
end if
end for
end while
if there is one more vertex which hasn't been labeled
Recolor those vertexs which aren't labeled in last BWBFS into rootNode
newRootNode := FWBFS(graph_CSR, BWQueue, color)
FWBFS(graph_CSR, BWQueue, color)
end if
end while
Here, for colorbased alogrithm, each backward label process must be started from the vertex whose color is equal to its vertex ID. In other words, the starting vertex must own the lowest vertex ID in the following SCC. As a result, we use one single FWBFS process to find the starting vertex before each BWlabel. And another FWBFS is required to recolor using the true starting vertex if the first FWBFS is started from one vertex with greater vertex ID.
Interface¶
The input should be a directed graph in compressed sparse row (CSR) format. The result will return a vertex list with each vertex value containing the lowest vertex id in the SCC.
Implemention¶
The detail algorithm implemention is shown as the figure below:
In the SCC kernel design, FWcolor and BWLabel can be dataflow by sharing some common buffers with multiple independent AXI masters. As a result, these two processes will be run at the same time to obtain a better performance.
Profiling and Benchmarks¶
The kernel is built by Vivado tools and benchmard in U250 FPGA card at 275MHz. The hardware resource utilization and benchmark performance are listed in the table below.
Name  LUT  BRAM  URAM  DSP 
Platform  104112  165  0  4 
scc_kernel  164311  523.5  110  6 
Total  268423 (16%)  688.5 (26%)  110 (9%)  10 (0%) 
Datasets  Vertex  Edges 

Iteration number in Spark  FPGA Time  Spark (4 threads)  Spark (8 threads)  Spark (16 threads)  Spark (32 threads)  
Spark Time  Speed up  Spark Time  Speed up  Spark Time  Speed up  Spark Time  Speed up  
citPatents  3774768  16518948  3774768  6  20711  52137  2.52  60517  2.92  51390  2.48  39939  1.93 
hollywood  1139905  57515616  1139905  6  9780  75681  7.74  45935  4.70  39595  4.05  29665  3.03 
socLiveJournal1  4847571  68993773  971232  6  39952  424444  10.62  304755  7.63  244916  6.13  231465  5.79 
ljournal2008  5363260  79023142  1119171  16  34840  540199  15.51  458633  13.16  378304  10.86  402120  11.54 
GEOMEAN  23043  173431  7.53X  140397  6.09X  117178  5.09X  102476  4.45X 
Note
1. Spark running on platform with Intel(R) Xeon(R) CPU E52690 v4 @2.600GHz, 56 Threads (2 Sockets, 14 Core(s) per socket, 2 Thread(s) per core)2. Time unit: ms.