'Neo4j query really slow when using collect(distinct) and order by
I'm a novice in Neo4j right now
database schema:
Currently i'm trying to find all (developer|company|publisher) nodes with it's related platform's Games Nodes in my data:
- find a Game node where the game belong to a Platform node
- Game node must have a relationship with a Genre node contains a certain string, Game.deleted property must not be Null
- we have relationships of Platform -- Game, (Developer|Company|Publisher)-[r:DEVELOPED_BY|PUBLISHED_BY] - Game --> Genre
- query will return (Developer|Company|Publisher) with it's related Game node
So i have the following query:
MATCH (p:Platform)--(g:Game),(d)-[r:DEVELOPED_BY|PUBLISHED_BY]-(g)-->(ge:Genre)
WHERE ge.id CONTAINS 'Filtered' AND p.id = 'PS5' AND g.deleted is null
RETURN g, collect(DISTINCT d) as d ORDER BY g.title DESC SKIP 0 LIMIT 100
the return row contain 2 columns, games and it's d(developer|publisher|company) for examples:
PROBLEM: the query run really slow (15~30sec depend on platform) even if i indexed all the properties (Game.id), (Platform.id)
Started streaming 100 records after 312 ms and completed after 15743 ms.
Profile:
My data has hundred thousands of Game node. How do i improve the query? i see that these 2 have million of DB hits
UPDATES:
After removing the contains query and changed to using regex
=~ "Filtered.*"
The query speed improved significantly, down to only 2000ms(2 secs) down from 15743ms(15 secs). I still don't understand why CONTAINS slow down the query by so much, even though i already indexed the field with text index type
the above query slow down by alot if i query on around 200000 game nodes, which still doesn't solve my problem
changing the query to this with the large game node dataset speed up the query to just under 1 second
MATCH (p:Platform)--(g:Game),(d)-[r:DEVELOPED_BY|PUBLISHED_BY]-(g)-->(ge:Genre)
WHERE ge.id CONTAINS 'Filtered' AND p.id = 'PS5' AND g.deleted is null
RETURN g, d LIMIT 100
why is collect(distinct d) and order by took so much time? how can i solve this?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|