'Classifying the order of classes rather than the classes themselves

I'm currently tackling a classification problem that needs prediction of the order of objects in a sequence, but not the object class itself.

I've spent quite some time Googling this but can't find anything related, or don't know how to properly phrase the problem (maybe there is a name for this type of prediction). Any help would be appreciated!

Problem:

Below are some examples, where each element represents say a pixel belonging to a dog (D), cat (C), sheep (S) or background (-):

Example 1: typical case
input:  [D, D, D, -, -, -, C, C, C, -, -, -, S, S, S, -, -, -]
target: [1, 1, 1, 0, 0, 0, 2, 2, 2, 0, 0 ,0, 3, 3, 3, 0, 0, 0]
Example 2: sequence with discontinuous object
input:  [C, C, C, -, -, C, C, C, C, C, C, D, D, D, D, S, S, -]
target: [1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 0]
Example 3: same as above but more extreme
input:  [S, S, S, S, S, -, -, -, -, -, -, -, -, -, S, S, S, S]
target: [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1]
Example 4: discontinuous objects and multiple instances
input:  [C, C, C, C, D, D, D, D, C, C, C, C, D, D, D, D, D, D]
target: [1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3]

As shown with examples 1 and 2, the actual class of the object can change their ordering, but map to the same target class id, because all that id says, is that the object is first in the sequence, relative to everything else.

I know that I could instead, predict the start and end of each object, but as seen in examples 2-4, there are sometimes discontinuous subsequences that are part of the same object, so that might complicate matters even more.

Currently I have a linear layer that produces the final predictions in a raw logit tensor of size (b, n, n_classes), where b is batch size, n is sequence length, and n_classes is the maximum number of objects in my dataset. My concern is that the linear layer will struggle to do anything useful because the same object class at a different position in the sequence, would need to be mapped to a different output.

My question is essentially whether there is a better way of predicting the order of objects in a sequence, in a class-invariant/generic manner?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source