'Unexpected Count & Filter Behaviour in AWS Neptune
I'm getting an unexpected StopIteration
error with some gremlin queries that contain a count
step within nested filter
steps.
This error can be recreated with the following code (using Gremlin-Python
, 3.5.0
in my case):
filter_header = g.addV().id().next()
count_headers = [g.addV().id().next() for _ in range(10)]
for i, c in enumerate(count_headers):
# Add 10 nodes
sub_nodes = [g.addV().id().next() for _ in range(10)]
# Connect them all to the header
for s in sub_nodes:
g.V(c).addE('edge').to(__.V(s)).iterate()
# Connect i of them to the filter header
for s in sub_nodes[:i]:
g.V(filter_header).addE('edge').to(__.V(s)).iterate()
# This raises StopIterationError
g.V(count_headers).filter(
__.out('edge').filter(
__.in_('edge').hasId(filter_header)
).count().is_(P.gt(1))
).count().next()
(Equivalently if using toList
instead of next
I get an empty list)
However this error doesn't happen if you unfold
after the count
:
# No StopIterationError
g.V(count_headers).filter(
__.out('edge').filter(
__.in_('edge').hasId(filter_header)
).count().unfold().is_(P.gt(1))
).count().next()
Neither does it happen if you use map
instead of filter
:
# No StopIterationError
g.V(count_headers).as_('c').map(
__.out('edge').filter(
__.in_('edge').hasId(filter_header)
).count().is_(P.gt(1))
).select('c').count().next()
I've tested and this error doesn't happen when using TinkerGraph, so I suspect this is specific to AWS Neptune.
I'd really appreciate any guidance as to why this happens, if I'm doing anything wrong, or what the differences are that means this just happens in Neptune. Alternatively - if the consensus is that this is a bug - I'd appreciate it if anyone could let me know where to raise it.
Solution 1:[1]
For anyone that finds themselves here: this was a bug that was fixed in Neptune Engine release 1.1.1.0.
"Fixed a rare Gremlin bug where no results were returned when using nested filter() and count() steps in combination"
(Thanks to the Neptune team for fixing!)
Solution 2:[2]
When using a Gremlin client, such as Gremlin Python, if a query has no result, the next
step will throw an error. I prefer to always use toList
as that way you are guaranteed to at least get an empty list back. If you use TinkerGraph locally with the Gremlin Console you will not see the same behavior. If getting no result is also unexpected, that is a second level item to explore.
As an example of the Python next
behavior, here is a simple experiment using the Python console. If you run your same tests with a Gremlin Server backed by TinkerGraph you will see the same results.
>>> g.V().hasId('I do not exist').next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ec2-user/.local/lib/python3.6/site-packages/gremlin_python/process/traversal.py", line 89, in next
return self.__next__()
File "/home/ec2-user/.local/lib/python3.6/site-packages/gremlin_python/process/traversal.py", line 50, in __next__
self.last_traverser = next(self.traversers)
StopIteration
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Hugh Blayney |
Solution 2 |