'Group elements of the data set if they are next to each other with LINQ

I have a data set (ex. 1, 1, 4, 6, 3, 3, 1, 2, 2, 2, 6, 6, 6, 7) and I want to group items of the same value but only if they are next to each other minimum 3 times.

Is there a way? I've tried using combinations of Count and GroupBy and Select in every way I know but I can't find a right one.

Or if it can't be done with LINQ then maybe some other way?



Solution 1:[1]

I don't think I'd strive for a 100% LINQ solution for this:

var r = new List<List<int>>() { new () { source.First() } };

foreach(var e in source.Skip(1)){
  if(e == r.Last().Last()) r.Last().Add(e);
  else r.Add(new(){ e });
}

return r.Where(l => l.Count > 2);

The .Last() calls can be replaced with [^1] if you like

This works like:

  • have an output that is a list of lists
  • put the first item in the input, into the output
  • For the second input items onward, if the input item is the same as the last int in the output, add the input item to the last list in the output,
  • Otherwise make a new list containing the input int and add it onto the end of the output lists
  • Keep only those output lists longer than 2

If he output is like:

[
  [2,2,2],
  [6,6,6]
]

Aggregate can be pushed into doing the same thing; this is simply an accumulator (r), an iteration (foreach) and an op on the result Where

var result = source.Skip(1).Aggregate(
    new List<List<int>>() { new List<int> { source.First() } }, 
    (r,e) => {
      if(e == r.Last().Last()) r.Last().Add(e);
      else r.Add(new List<int>(){ e });
      return r;
    },
    r => r.Where(l => l.Count > 2)
);

..but would you want to be the one to explain it to the new dev?


Another LINQy way would be to establish a counter that incremented by one each time the value in the source array changes compared to the pervious version, then group by this integer, and return only those groups 3+, but I don't like this so much because it's a bit "WTF"

var source = new[]{1, 1, 4, 6, 3, 3, 1, 2, 2, 2, 6, 6, 6, 7};
int ctr = 0;
var result = source.Select(
  (e,i) => new[]{ i==0 || e != source[i-1] ? ++ctr : ctr, e}
)
.GroupBy(
  arr => arr[0], 
  arr => arr[1]
)
.Where(g => g.Count() > 2);

Solution 2:[2]

If you're nostalgic and like stuff like the Obfuscated C code contest, you could solve it like this.
(No best practice claims included)

        int[] n = {1, 1, 4, 6, 3, 3, 1, 2, 2, 2, 6, 6, 6, 7};
        var t = new int [n.Length][];
        for (var i = 0; i < n.Length; i++)
            t[i] = new []{n[i], i == 0 ? 0 : n[i] == n[i - 1] ? t[i - 1][1] : t[i - 1][1] + 1};

        var r = t.GroupBy(x => x[1], x => x[0])
                 .Where(g => g.Count() > 2)
                 .SelectMany(g => g);

        Console.WriteLine(string.Join(", ", r));

In the end Linq is likely not the best solution here. A simple for-loop with 1,2,3 additional loop-variables to track the "group index" and the last value makes likely more sense. Even if it's 2 lines more code written.

Solution 3:[3]

I wouldn't use Linq just to use Linq.

I'd rather suggest using a simple for loop to loop over your input array and populate the output list. To keep track of which number is currently being repeated (if any), I'd use a variable (repeatedNumber) that's initially set to null.

In this approach, a number can only be assigned to repeatedNumber if it fulfills the minimum requirement of repeated items. Hence, for your example input, repeatedNumber would start at null, then eventually be set to 2, then be set to 6, and then be reset to null.

One perhaps good use of Linq here is to check if the minimum requirement of repeated items is fulfilled for a given item in input, by checking the necessary consecutive items in input:

input
    .Skip(items up to and including current item)
    .Take(minimum requirement of repeated items - 1)
    .All(equal to current item)

I'll name this minimum requirement of repeated items repetitionRequirement. (In your question post, repetitionRequirement is 3.)

The logic in the for loop goes a follows:

  • number = input[i]
  • If number is equal to repeatedNumber, it means that the previously repeated item continues being repeated
    • Add number to output
  • Otherwise, if the minimum requirement of repeated items is fulfilled for number (i.e. if the repetitionRequirement - 1 items directly following number in input are all equal to number), it means that number is the first instance of a new repeated item
    • Set repeatedNumber equal to number
    • Add number to output
  • Otherwise, if repeatedNumber has value, it means that the previously repeated item just ended its repetition
    • Set repeatedNumber to null

Here is a suggested implementation:
(I'd suggest finding a more descriptive method name)

//using System.Collections.Generic;
//using System.Linq;

public static List<int> GetOutput(int[] input, int repetitionRequirement)
{
    var consecutiveCount = repetitionRequirement - 1;
    
    var output = new List<int>();
    
    int? repeatedNumber = null;
            
    for (var i = 0; i < input.Length; i++)
    {
        var number = input[i];
        
        if (number == repeatedNumber)
        {
            output.Add(number);
        }
        else if (i + consecutiveCount < input.Length &&
            input.Skip(i + 1).Take(consecutiveCount).All(num => num == number))
        {
            repeatedNumber = number;
            output.Add(number);
        }
        else if (repeatedNumber.HasValue)
        {
            repeatedNumber = null;
        }
    }
    
    return output;
}

By calling it with your example input:

var dataSet = new[] { 1, 1, 4, 6, 3, 3, 1, 2, 2, 2, 6, 6, 6, 7 };

var output = GetOutput(dataSet, 3);

you get the following output:

{ 2, 2, 2, 6, 6, 6 }

Example fiddle here.

Solution 4:[4]

You could consider using the GroupAdjacent or the RunLengthEncode operators, from the MoreLinq package. The former groups adjacent elements in the sequence, that have the same key. The key is retrieved by invoking a keySelector lambda parameter. The later compares the adjacent elements, and emits a single KeyValuePair<T, int> for each series of equal elements. The int value of the KeyValuePair<T, int> represents the number of consecutive equal elements. Example:

var source = new[] { 1, 1, 4, 6, 3, 3, 1, 2, 2, 2, 6, 6, 6, 7 };

IEnumerable<IGrouping<int, int>> grouped = MoreLinq.MoreEnumerable
    .GroupAdjacent(source, x => x);
foreach (var group in grouped)
{
    Console.WriteLine($"Key: {group.Key}, Elements: {String.Join(", ", group)}");
}
Console.WriteLine();

IEnumerable<KeyValuePair<int, int>> pairs = MoreLinq.MoreEnumerable
    .RunLengthEncode(source);
foreach (var pair in pairs)
{
    Console.WriteLine($"Key: {pair.Key}, Value: {pair.Value}");
}

Output:

Key: 1, Elements: 1, 1
Key: 4, Elements: 4
Key: 6, Elements: 6
Key: 3, Elements: 3, 3
Key: 1, Elements: 1
Key: 2, Elements: 2, 2, 2
Key: 6, Elements: 6, 6, 6
Key: 7, Elements: 7

Key: 1, Value: 2
Key: 4, Value: 1
Key: 6, Value: 1
Key: 3, Value: 2
Key: 1, Value: 1
Key: 2, Value: 3
Key: 6, Value: 3
Key: 7, Value: 1

Live demo.

In the above example I've used the operators as normal methods, because I am not a fan of adding using MoreLinq; and "polluting" the IntelliSense of the Visual Studio with all the specialized operators of the MoreLinq package. An alternative is to enable each operator selectively like this:

using static MoreLinq.Extensions.GroupAdjacentExtension;
using static MoreLinq.Extensions.RunLengthEncodeExtension;

If you don't like the idea of adding a dependency on a third-party package, you could grab the source code of these operators (1, 2), and embed it directly into your project.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 lidqy
Solution 3 Astrid E.
Solution 4 Theodor Zoulias