'Finding Arithmetic Mean of Subarrays Efficiently in an Array

I am trying to find number of ways o calculate arithmetic mean of a subarray of an array. It boils down to this; given an array X and an integer S, how many contiguous fragments of X has arithmetic mean equal to S?

For instance, given X=[5,3,6,2] and S=4 result is 3. [5,3] , [6,2] and [5,3,6,2] has the mean 4.

X might have up to 100.000 elements. Each value of X is an integer in range of {-1.000.000.000,+1.000.000.000}. So is S. We don't round the arithmetic mean found.

My code below(on Java) works for small set of data but it is not efficient. O(n^2).

public static int returnSubsequenceCount(int[] X, int S) {
        int counter = 0;

        for (int i = 0; i < X.length; i++) {
            int[] dpSum = new int[X.length];

            dpSum[i] = X[i];

            if (X[i] == S) {
                counter++;
            }

            for (int j = i + 1; j < X.length; j++) {
                int sum = dpSum[j - 1] + X[j];

                dpSum[j] = sum;

                if ((double) sum / (j - i + 1) == S) {
                    counter++;
                }
            }
        }
        return counter;
    }

Solution 1:^[1]

I will use 1-based indexing for this algorithm. This feels like one of those cases where it is OK.

Let P be the partial sums array, that is P[0] = 0 and P[i] = X[1] + ... + X[i]. Furthermore, let Q[i] = P[i] - S * i. For example,

i     0   1   2   3   4   5   6   7
-----------------------------------
X         5   3   6   2   5   5   2
P     0   5   8  14  16  21  26  28
Q     0   1   0   2   0   1   2   0

What does it mean that the average of [i,j] (including i and j) is S? With the notations above, it can be written as

(P[j] - P[i - 1]) / (j - i + 1) = S     ==>
P[j] - P[i - 1] = S * (j - i + 1)       ==>
P[j] - P[i - 1] = S * j - S * (i - 1)   ==>
P[j] - S * j = P[i - 1] - S * (i - 1)   ==>
Q[j] = Q[i - 1]

This means that any pair of equal values in Q corresponds to a range of average S. For example, the two values of 1 in Q correspond to the range [3, 6, 2, 5]. The four values of 0 in Q give rise to six ranges of average S=4: [5,3], [6,2], [5,5,2], [5,3,6,2], [6,2,5,5,2] and [5,3,6,2,5,5,2].

Therefore the following algorithm also runs in O(n log n), the same as @Polygnome's comment, but should be considerably easier to implement:

calculate Q;
sort Q;
for every batch of k equal values in Q, add k * (k - 1) / 2 to the answer;
return the answer.

This can be reduced to O(n) using a hash table or if the range of the values in Q is small enough to allow counting sort.

Solution 2:^[2]

There's a trick here to obtain an O(n) algorithm:

average = (A[i] + A[i+1] ... + A[j]) / (j - i + 1)

average * (j - i + 1) = A[i] + A[i+1]...+ A[j]

Notice that since average is now multiplied by exactly the number of elements on the right side of the equation, we can subtract the average once for each one of the elements:

0 = (A[i]-average) + (A[i+1]-average) ... + (A[j]-average)

Finding contiguous sums that equal zero can be done by examining prefix sums. For each rightmost element (A[j]-average), we want to add the number of times we've seen the same prefix sum before. We make an adjustment for prefix sum 0 so as to count the full length of the array prefix if applicable:

2 1 3, avg 2

2-2 = 0    ps = 0    count = 1 (1 for the full array prefix)
1-2 = -1   ps = -1
3-2 = 1    ps = 0    count = 2 (1 for index 0 and 1 for the full array prefix)

                     total = 3

Solution 3:^[3]

Heres a Java solution with prefix sums, also using feedback from this thread.

import java.util.*;

        public static int returnSubsequenceCount(int[] X, int S)
    {
        HashMap<Integer, Integer> prefixes = new HashMap<Integer, Integer>();
        int result = 0;
        int[] P = new int[X.length + 1];
        prefixes.put(0, 1);

        int[] Q = new int[X.length + 1];
        P[0] = 0;
        Q[0] = 0;

        for (int i = 1; i < X.length + 1; i++)
        {
            P[i] = P[i - 1] + X[i - 1];
            Q[i] = P[i] - S * i;

            if (!prefixes.containsKey(Q[i]))
            {
                prefixes.put(Q[i], 1);
            }
            else
            {
                Integer temp=prefixes.get(Q[i]);
                temp++;
                prefixes.put(Q[i],temp);

            }

        }

        for (Map.Entry<Integer, Integer> entry : prefixes.entrySet())
        {
            int value = entry.getValue();
            result += value * (value - 1) / 2;
        }

        return result;
    }

Solution 4:^[4]

I found explanation by C?t?lin Frâncu extremely clear and concise. Here is the code with tests (Scala):

object T3 {
  def numOfSubArraysWithMean(a: Array[Int], s: Int): Int =
    a.lazyZip(LazyList from 1)
      .foldLeft((0, List(0))) { case ((pi, q), (ai, i)) =>
        val pi2 = pi + ai
        val qi = pi2 - s * i
        (pi2, qi :: q)
      }._2
      .groupMapReduce(identity)(_=>1)(_+_)
      .withFilter { case (_, v) => v > 1 }
      .map { case (_, v) => v * (v - 1) / 2 }
      .sum
}

class T3Spec extends AnyFunSpec with Matchers {
  import T3._
  def a[A: ClassTag](aa: A*): Array[A] = aa.toArray
  it("a") {
    val data = Seq(
      (a(5,3,6,2,5,5,2),4) -> 8,
      (a(5,3,6,2,5),4) -> 4,
      (a(5,3,6,2,5,3),4) -> 7,
      (a(5,3,6,2),4) -> 3,
      (a(5,3,6,2,4),4) -> 6,
      (a(2,1,3,7,2,2,1,3),2) -> 9,
      (a(0,8,1,7,2,6,3,5),4) -> 10,
      (a(2,2,3,4,1,1),2) -> 4,
      (a(2,2,2,2,2,2),2) -> 21,
      (a(2,2,2,2),2) -> 10,
    )
    for {
      ((a, s), r) <- data
    } numOfSubArraysWithMean(a, s) shouldEqual r
  }
}

In most cases, N vs NlogN doesn't matter. But N^2 is a problem

Solution 5:^[5]

Kotlin version

fun solution(A: IntArray, S: Int): Int {
    return A.asSequence()
        .runningFold(0) { acc, i -> acc + i }
        .map { ((it % S) + S) % S }
        .groupingBy { it }
        .eachCount()
        .values
        .sumOf { it * (it - 1) / 2 }
}

Solution 6:^[6]

This is how to do it in python, the key is to produce Powerset

def mean_subsequence(array: list, mean: int) -> int:
result = 0
powerset = [[]]

for num in array:
    len_ = len(powerset)
    for i in range(len_):
        subsequent = powerset[i]
        #check the mean
        tmp = [num] + subsequent
        if tmp != [] and sum(tmp) / len(tmp) == mean:
            result += 1
        #update power set
        powerset.appened([num] + subsequent)
        
return result

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	C?t?lin Frâncu
Solution 2	×’×œ×¢×“ ×‘×¨×§×Ÿ
Solution 3
Solution 4
Solution 5
Solution 6	Anggi Permana Harianja

'Finding Arithmetic Mean of Subarrays Efficiently in an Array

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]

Solution 5:[5]

Solution 6:[6]