'Sort vertices by presence of 2 properties
UPDATE 1
I've added the descLength
and imageLength
properties to allow for easier sorting. The idea is that constant(0)
can be used to fill in the values for users who lack either property, and any length greater than 0 can be used to identify a user who actually has the property. The furthest this gets me is being able to order().by()
only one property at a time, using a query such as:
g.V().
order().
by(coalesce(values('descLength'), constant(0)))
But this isn't the full solution to match what I need.
Original Post
In amazon neptune I want to sort vertices based on the presence of 2 properties, desc
and image
. The order of ranking should be:
- vertices that have both properties
- vertices that have
desc
but notimage
- vertices that have
image
but notdesc
- vertices that have neither property
Consider this graph of users and their properties:
g.addV('user').property('type','person').as('u1').
addV('user').property('type','person').property('desc', 'second person').property('descLength', 13).as('u2').
addV('user').property('type','person').property('desc', 'third person').property('descLength', 12).property('image', 'https://www.example.com/image-3.jpeg').property('imageLength', 36).as('u3').
addV('user').property('type','person').property('image', 'https://www.example.com/image-4.jpeg').property('imageLength', 36).as('u4')
Using the ranking order I outlined, the results should be:
u3
because it has bothdesc
andimage
u2
because it hasdesc
but notimage
u4
because it hasimage
but notdesc
u1
because it has neitherdesc
norimage
The order().by()
samples I've seen work with data like numbers and dates that can be ranked by increasing/decreasing values, but of course strings like urls and text can't. What's the correct way to achieve this?
Solution 1:[1]
This first query is not exactly what you are looking for as it treats 'image' and 'desc' as the same weighting, but with this foundation, it should be possible to build out any variations of the query to better meet your needs.
Given:
g.V().hasLabel('user').
project('id','data').
by(id).
by(values('desc','image').fold()).
order().
by(select('data').count(local),desc)
we get
{'id': '92c04ae3-5a7f-ea4c-e74f-e7f79b44ad3a', 'data': ['third person', 'https://www.example.com/image-3.jpeg']}
{'id': 'e8c04ae3-5a7f-2cfb-cc28-cd663bd58ef9', 'data': ['second person']}
{'id': 'c8c04ae3-5a80-5707-8ba6-56554de98f33', 'data': ['https://www.example.com/image-4.jpeg']}
{'id': 'a6c04ae3-5a7e-fd0f-1197-17f3ce44595f', 'data': []}
Building on this, we can go one step further and calculate a score based on how many of the properties exist in each case. The query below gives desc
a higher score than image
so in the cases where they do not both exist, desc
will sort higher.
g.V().hasLabel('user').
project('id','data','score').
by(id).
by(values('desc','image').fold()).
by(union(
has('desc').constant(2),
has('image').constant(1),
constant(0)).
sum()).
order().
by(select('score'),desc)
which yields
{'id': '92c04ae3-5a7f-ea4c-e74f-e7f79b44ad3a', 'data': ['third person', 'https://www.example.com/image-3.jpeg'], 'score': 3}
{'id': 'e8c04ae3-5a7f-2cfb-cc28-cd663bd58ef9', 'data': ['second person'], 'score': 2}
{'id': 'c8c04ae3-5a80-5707-8ba6-56554de98f33', 'data': ['https://www.example.com/image-4.jpeg'], 'score': 1}
{'id': 'a6c04ae3-5a7e-fd0f-1197-17f3ce44595f', 'data': [], 'score': 0}
UPDATED 2022-05-06 To show how to get just the ID
Taking the query above, to get the ID from the results is as simple as adding a select('id')
at the end of he query.
g.V().hasLabel('user').
project('id','data','score').
by(id).
by(values('desc','image').fold()).
by(union(
has('desc').constant(2),
has('image').constant(1),
constant(0)).
sum()).
order().
by(select('score'),desc).
select('id')
However, we can also remove some of the other work the query is doing to fetch the results. I mainly included those for demonstration purposes. So we can reduce the query to:
g.V().hasLabel('user').
project('id','score').
by(id).
by(union(
has('desc').constant(2),
has('image').constant(1),
constant(0)).
sum()).
order().
by(select('score'),desc).
select('id')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |