'NetworkX average shortest path length and diameter is taking forever
I have a graph (A) built from unweighted edges, and I would like to compute the average shortest path length for the biggest connected graph (giantC) in my main graph (A). However, the script has been running for more than 3 hours so far (tried on Colab and locally), and no results are output neither for diameter
nor for average_shortest_path_length
.
I am using networkx==2.5
, python==3.6.9
and here is my script
import logging
import networkx as nx
from networkx.algorithms.distance_measures import diameter
from networkx.algorithms.shortest_paths.generic import average_shortest_path_length
# graph is built from a json file as follows
with open('graph.json') as f:
graph_dict = json.load(f)
_indices = graph_dict['indices']
s_lst, rs_lst= _indices[0], _indices[1]
graph_ = nx.Graph()
for i in range(len(s_lst)):
graph_.add_edge(s_lst[i], rs_lst[i])
# fetch the hugest graph of all graphs
connected_subgraphs = [graph_.subgraph(cc) for cc in
nx.connected_components(graph_)]
logging.info('connected subgraphs fetched.')
Gcc = max(nx.connected_components(graph_), key=len)
giantC = graph_.subgraph(Gcc)
logging.info('Fetched Giant Subgraph')
n_nodes = giantC.number_of_nodes()
print(f'Number of nodes: {n_nodes}') # output is 106088
avg_shortest_path = average_shortest_path_length(giantC)
print(f'Avg Shortest path len: {avg_shortest_path}')
dia = diameter(giantC)
print(f'Diameter: {dia}')
Is there any way to make it faster? or an alternative to computing both the diameter and shortest path length for the giantC graph?
Solution 1:[1]
For future readers, if you want to fetch the largest connected subgraph from your NetworkX Graph
import networkx as nx
import logging
def fetch_hugest_subgraph(graph_):
Gcc = max(nx.connected_components(graph_), key=len)
giantC = graph_.subgraph(Gcc)
logging.info('Fetched Giant Subgraph')
return giantC
If you want to compute the average shortest path length for your graph we can do that by sampling
from statistics import mean
import networkx as nx
import random
def write_nodes_number_and_shortest_paths(graph_, n_samples=10_000,
output_path='graph_info_output.txt'):
with open(output_path, encoding='utf-8', mode='w+') as f:
for component in nx.connected_components(graph_):
component_ = graph_.subgraph(component)
nodes = component_.nodes()
lengths = []
for _ in range(n_samples):
n1, n2 = random.choices(list(nodes), k=2)
length = nx.shortest_path_length(component_, source=n1, target=n2)
lengths.append(length)
f.write(f'Nodes num: {len(nodes)}, shortest path mean: {mean(lengths)} \n')
Computing avg_shortest_path_length
as I was informed by Joris Kinable (in the comments) has the complexity of O(V^3); V = number of nodes
. The same applies for computing the diameter of your graph.
Solution 2:[2]
For future readers. In NetworkX >= 2.6
is available a diameter approximated metric for both directed and undirected graphs.
Example:
>>> import timeit
>>> timeit.timeit("print(nx.diameter(g))",setup="import networkx as nx; g = nx.fast_gnp_random_graph(5000, 0.03, 100)", number=1)
3
224.9891120430002
>>> timeit.timeit("print(nx.approximation.diameter(g))",setup="import networkx as nx; g = nx.fast_gnp_random_graph(5000, 0.03, 100)", number=1)
3
0.09284040399961668
Note that the approximated metric will compute a lower bound with the respect to the exact value.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | abc |