'Swapping elements between A and B to get sums equality

We have two arrays of equal length, A and B. also, for every i: 0 <= a_i, b_i <= m for some 1<=m<= 1000000. We want to check if a single swap between some term of A and some term of B will make the sums of the arrays be equal.

Consider the following solution:

def fast_solution(A, B, m):
  n = len(A)
  sum_a = sum(A)
  sum_b = sum(B)
  d = sum_b - sum_a
  if d % 2 == 1:
    return False
  d //= 2
  count = counting(A, m) # returns a mapping <integer, #occurrences in A>
  for i in range(n):
    if 0 <= B[i] - d and B[i] - d <= m and count[B[i] - d] > 0:
      return True
  return False

I'd be glad if you could explain the reasoning behind the last if clause.

Source of the problem



Solution 1:[1]

If there is such a swap, then the difference between the two values must be half that of the difference in the sums. Swapping two values means that the sum of both lists will change, one going up, the other down, by the same amount. The two changes must add up to the difference between the sums before the swap, and both sums change by the same value (+d and -d, or value delta, which is the difference between the two swapped values).

First, the function calculates d as a delta between the sums, the sum delta. Note that sum_a could be larger than sum_b, at which point the result sum_b - sum_a is negative. This simply means that there must be a value in B that is smaller than the target value in A, a swap then would decrease sum_a and increase sum_b to make them equal. If the parity of the sum delta is odd rather than even, you'll never find a value delta that is half the sum delta so the function returns False at this point. The final value of d is the value delta, the amount of difference between the two swapped values. Remember, the value delta is half the sum delta.

The algorithm counts all values in A, then tests all values in B. It would only be possible to swap two values between A and B if there is a value in B that differs by d from a value in A. The value in A to swap with B would need to be equal to b_value - d. For a negative d (sum_a > sum_b) that would make b_value smaller, for a positive d that would require b_value to be the bigger number.

The if test looks to see if there is a value in B - d available in A, but it first tests if b_value - d is still within the range of [0-m]:

  • 0 <= B[i] - d test if the number sought for in A is still a positive number.
  • B[i] - d <= m tests if the number sought is still no larger than m; it could be if B[i] was close and d is negative.
  • count contains counts for the numbers in A; if count[B[i] - d] > 0 is true, then there is at least one such number in A. This is a number that can be swapped.

The range test is needed because the counted list only holds counts for the numbers from 0 through to m (inclusive), not for negative numbers or for numbers larger than m.

The function can be improved by using a set instead of a counting function. There is no need to know how many times a number appears in A, just that it exists. This would make boundary checks obsolete, because numbers out of bound are simply not going to be present in a set of the values of A.

Once we have a set of values of A, we can test if this set is disjoint from the set of b values with the delta applied, using set.isdisjoint():

def faster_solution(A, B, m=None):  # m is ignored in this version
    delta = sum(A) - sum(B)
    if delta % 2 == 1:
        return False
    delta //= 2
    return not set(A).isdisjoint(b - delta for b in B)

This returns True if there is a value in A that is equal to a value in B minus the delta. Python will only loop over the b - delta for b in B loop until a match is found (at which point the sets are not disjoint, and not inverses that result to True), or the loop has been exhausted and so no such value is found in A and the sets are found to be disjoint.

The counter() function shown has another issue: it requires way more memory than is needed, and it is very slow compared to a collections.Counter() object which has an optimised loop implemented in to do the counting. A Counter() uses a dictionary (hash map) to store counts only for counts greater than 0.

The set solution above beats the 'fast solution' hands down:

>>> import timeit, random
>>> m = 1000000
>>> testdata = [random.randrange(m + 1) for _ in range(1000)]
>>> testinputs = (testdata[:], testdata[:])
>>> random.shuffle(testinputs[0])  # The order of A differs from B
>>> testinputs[1][-1] -= testinputs[1][-1] // 2  # now the two sums differ by an even amount, guaranteed to be in range
>>> assert testinputs[1][-1] > 0  # make sure the original random value was not 0 or 1.
>>> # note: It's the *last value in B* that makes it possible to swap;
... # this is one of two worst-case scenarios (the other is even-delta-no-swap).
...
>>> assert fast_solution(*testinputs, m)    # the original finds a solution
>>> assert faster_solution(*testinputs, m)  # my version finds a solution
>>> timeit.timeit("f(*ab, m)", "from __main__ import fast_solution as f, testinputs as ab, m", number=1000)
2.3270092820748687
>>> timeit.timeit("f(*ab, m)", "from __main__ import faster_solution as f, testinputs as ab, m", number=1000)
0.13949943508487195

Not using a counter, and using Python's set functionality made that about 17 times faster for inputs of length 1000!

The moral of the story: use the best tools available in your language of choice, and think critically about what is actually needed to solve the problem. Python's built-in types and operations often can let you avoid running the critical loops in Python bytecode, significantly reducing the constant-time factors of an algorithm.

Solution 2:[2]

From https://www.geeksforgeeks.org/find-a-pair-swapping-which-makes-sum-of-two-arrays-same:

We are looking for two values, a and b, such that:

sumA - a + b = sumB - b + a
2b - 2a = sumB - sumA
b - a = (sumB - sumA) / 2

(sumB - sumA) / 2 is a target difference d

count[B[i] - d] > 0 - this simply means that there must exist such value in the array A in order to satisfy condition

Solution 3:[3]

This for-loop at the end searches each element of the B array. To be the swapped element at index i, it has to satisfy two conditions:

  • B[i] - d must be between 0 and m. You can imagine 2 * d as how much sum(B) is larger than sum(A), so by swapping B[i] with B[i] - d, array A with gain d and array B will lose it, increasing the difference by 2 * d
  • B[i] - d must exist in A

It's not good for understanding though to redefine d = d / 2 in the middle of the code :)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 GoodPerson
Solution 3 fafl