'Parsing py2neo paths into Pandas
We are returning paths
from a cypher
query using py2neo
. We would like to parse the result into a Pandas DataFrame
. The cypher
query is similar to the following query
query='''MATCH p=allShortestPaths(p1:Type1)-[r*..3]-(p2:Type1)
WHERE p1.ID =123456
RETURN distinct(p)''
result = graph.run(query)
The resulting object is a walkable object - which can be traversed. It should be noted that the Nodes
and Relationships
don't have the same properties.
What would be the most pythonic
way to iterate over the object? Is it necessary to process the entire path or since the object is a dictionary is it possible to use the Pandas.from_dict
method? There is an issue that sometimes the length of the paths are not equal.
Currently we are enumerating the object and if it is an un-equal object then it is a Node , otherwise we process the object as a relationship.
for index, item in enumerate(paths):
if index%2 == 0:
#process as Node
else:
#process as Relationship
We can use the isinstance
method i.e.
if isinstance(item, py2neo.types.Node ):
#process as Node
But that still requires processing every element separately.
Solution 1:[1]
I solve the problem as follows:
I wrote a function that receives a list of paths with the properties of the nodes and relationships
def neo4j_graph_to_dict(paths, node_properties, rels_properties):
paths_dict=OrderedDict()
for (pathID, path) in enumerate(paths):
paths_dict[pathID]={}
for (i, node_rel) in enumerate(path):
n_properties = [node_rel[np] for np in node_properties]
r_properties = [node_rel[rp] for rp in rels_properties]
if isinstance(node_rel, Node):
node_fromat = [np+': {}|'for np in node_properties]
paths_dict[pathID]['Node'+str(i)]=('{}: '+' '.join(node_fromat)).format(list(node_rel.labels())[0], *n_properties)
elif isinstance(node_rel, Relationship):
rel_fromat = [np+': {}|'for np in rels_properties]
reltype= 'Rel'+str(i-1)
paths_dict[pathID][reltype]= ('{}: '+' '.join(rel_fromat)).format(node_rel.type(), *r_properties)
return paths_dict
Assuming the query returns the paths, nodes and relationships we can run the following code:
query='''MATCH paths=allShortestPaths(
(pr1:Type1 {ID:'123456'})-[r*1..9]-(pr2:Type2 {ID:'654321'}))
RETURN paths, nodes(paths) as nodes, rels(paths) as rels'''
df_qf = pd.DataFrame(graph.data(query))
node_properties = set([k for series in df_qf.nodes for node in series for k in node.keys() ]) # get unique values for Node properites
rels_properties = set([k for series in df_qf.rels for rel in series for k in rel.keys() ]) # get unique values for Rels properites
wg = [(walk(path)) for path in df_qf.paths ]
paths_dict = neo4j_graph_to_dict(wg, node_properties, rels_properties)
df = pd.DataFrame(paths_dict).transpose()
df = pd.DataFrame(df, columns=paths_dict[0].keys()).drop_duplicates()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | skibee |