'Sort vertices by presence of 2 properties

UPDATE 1

I've added the descLength and imageLength properties to allow for easier sorting. The idea is that constant(0) can be used to fill in the values for users who lack either property, and any length greater than 0 can be used to identify a user who actually has the property. The furthest this gets me is being able to order().by() only one property at a time, using a query such as:

g.V().
  order().
    by(coalesce(values('descLength'), constant(0)))

But this isn't the full solution to match what I need.


Original Post

In amazon neptune I want to sort vertices based on the presence of 2 properties, desc and image. The order of ranking should be:

  • vertices that have both properties
  • vertices that have desc but not image
  • vertices that have image but not desc
  • vertices that have neither property

Consider this graph of users and their properties:

g.addV('user').property('type','person').as('u1').
  addV('user').property('type','person').property('desc', 'second person').property('descLength', 13).as('u2').
  addV('user').property('type','person').property('desc', 'third person').property('descLength', 12).property('image', 'https://www.example.com/image-3.jpeg').property('imageLength', 36).as('u3').
  addV('user').property('type','person').property('image', 'https://www.example.com/image-4.jpeg').property('imageLength', 36).as('u4')

Using the ranking order I outlined, the results should be:

  • u3 because it has both desc and image
  • u2 because it has desc but not image
  • u4 because it has image but not desc
  • u1 because it has neither desc nor image

The order().by() samples I've seen work with data like numbers and dates that can be ranked by increasing/decreasing values, but of course strings like urls and text can't. What's the correct way to achieve this?



Solution 1:[1]

This first query is not exactly what you are looking for as it treats 'image' and 'desc' as the same weighting, but with this foundation, it should be possible to build out any variations of the query to better meet your needs.

Given:

g.V().hasLabel('user').
      project('id','data').
        by(id).
        by(values('desc','image').fold()).
  order().
    by(select('data').count(local),desc)

we get

{'id': '92c04ae3-5a7f-ea4c-e74f-e7f79b44ad3a', 'data': ['third person', 'https://www.example.com/image-3.jpeg']}
{'id': 'e8c04ae3-5a7f-2cfb-cc28-cd663bd58ef9', 'data': ['second person']}
{'id': 'c8c04ae3-5a80-5707-8ba6-56554de98f33', 'data': ['https://www.example.com/image-4.jpeg']}
{'id': 'a6c04ae3-5a7e-fd0f-1197-17f3ce44595f', 'data': []}

Building on this, we can go one step further and calculate a score based on how many of the properties exist in each case. The query below gives desc a higher score than image so in the cases where they do not both exist, desc will sort higher.

g.V().hasLabel('user').
      project('id','data','score').
        by(id).
        by(values('desc','image').fold()).
        by(union(
             has('desc').constant(2),
             has('image').constant(1),
             constant(0)).
            sum()).
  order().
    by(select('score'),desc)

which yields

{'id': '92c04ae3-5a7f-ea4c-e74f-e7f79b44ad3a', 'data': ['third person', 'https://www.example.com/image-3.jpeg'], 'score': 3}
{'id': 'e8c04ae3-5a7f-2cfb-cc28-cd663bd58ef9', 'data': ['second person'], 'score': 2}
{'id': 'c8c04ae3-5a80-5707-8ba6-56554de98f33', 'data': ['https://www.example.com/image-4.jpeg'], 'score': 1}
{'id': 'a6c04ae3-5a7e-fd0f-1197-17f3ce44595f', 'data': [], 'score': 0}

UPDATED 2022-05-06 To show how to get just the ID

Taking the query above, to get the ID from the results is as simple as adding a select('id') at the end of he query.

g.V().hasLabel('user').
      project('id','data','score').
        by(id).
        by(values('desc','image').fold()).
        by(union(
             has('desc').constant(2),
             has('image').constant(1),
             constant(0)).
            sum()).
  order().
    by(select('score'),desc).
  select('id')

However, we can also remove some of the other work the query is doing to fetch the results. I mainly included those for demonstration purposes. So we can reduce the query to:

g.V().hasLabel('user').
      project('id','score').
        by(id).
        by(union(
             has('desc').constant(2),
             has('image').constant(1),
             constant(0)).
            sum()).
  order().
    by(select('score'),desc).
  select('id')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1