'numpy mean of complex numbers with infinities

numpy seems to not be a good friend of complex infinities

While we can evaluate:

In[2]: import numpy as np

In[3]: np.mean([1, 2, np.inf])
Out[3]: inf

The following result is more cumbersome:

In[4]: np.mean([1 + 0j, 2 + 0j, np.inf + 0j])
Out[4]: (inf+nan*j)
...\_methods.py:80: RuntimeWarning: invalid value encountered in cdouble_scalars
  ret = ret.dtype.type(ret / rcount)

I'm not sure the imaginary part make sense to me. But please do comment if I'm wrong.

Any insight into interacting with complex infinities in numpy?



Solution 1:[1]

Solution

To compute the mean we divide the sum by a real number. This division causes problems because of type promotion (see below). To avoid type promotion we can manually perform this division separately for the real and imaginary part of the sum:

n = 3
s = np.sum([1 + 0j, 2 + 0j, np.inf + 0j])
mean = np.real(s) / n + 1j * np.imag(s) / n
print(mean)  # (inf+0j)

Rationale

The issue is not related to numpy but to the way complex division is performed. Observe that ((1 + 0j) + (2 + 0j) + (np.inf + 0j)) / (3+0j) also results in (inf+nanj).

The result needs to be split into a real and imagenary part. For division both operands are promoted to complex, even if you divide by a real number. So basically the division is:

 a + bj
--------
 c + dj

The division operation does not know that d=0. So to split the result into real and imaginary it has to get rid of the j in the denominator. This is done by multiplying numerator and denominator with the complex conjugate:

 a + bj     (a + bj) * (c - dj)     ac + bd + bcj - adj
-------- = --------------------- = ---------------------
 c + dj     (c + dj) * (c - dj)        c**2 + d**2

Now, if a=inf and d=0 the term a * d * j = inf * 0 * j = nan * j.

Solution 2:[2]

when you run the function with a np.inf in your array the result will be the infinity object for np.mean or another functions like np.max(). But in this case for calculating the mean(), since you have complex numbers and an infinity complex numbers is defined as an infinite number in the complex plane whose complex argument is unknown or undefined, you're getting non*j as the imaginary part.

In order to get around this problem, you should ignore the infinity items in such mathematical operations. You can use isfinite() function to detect them and apply the function on finite items:

In [16]: arr = np.array([1 + 0j, 2 + 0j, np.inf + 0j])

In [17]: arr[np.isfinite(arr)]
Out[17]: array([ 1.+0.j,  2.+0.j])

In [18]: np.mean(arr[np.isfinite(arr)])
Out[18]: (1.5+0j)

Solution 3:[3]

Because of type promotion.

When you do the division of a complex by a real, like (inf + 0j) / 2, the (real) divisor gets promoted to 2 + 0j.

And by complex division, the imaginary part is equal to (0 * 2 - inf * 0) / 4. Note the inf * 0 here which is an indeterminate form, and it evaluates to NaN. This makes the imaginary part NaN.

And back to the topic. When numpy calculates the mean of a complex array, it really doesn't try to do anything clever. First it reduces the array with the "addition" operation, obtaining the sum. After that, the sum is divided by the count. This sum contains an inf in the real part, which causes the trouble described above when the divisor (count) gets promoted from integral type to complex floating point.

Edit: a word about solution

The IEEE floating point "infinity" is really a very primitive construct that represents indeterminate forms like 1 / 0. These forms are not constant numbers, but possible limits. The special inf or NaN "floating point numbers" are placeholders that notifies you about the presence of indeterminate forms. They do nothing about the existence or type of the limit, which you must determine by the mathematical context.

Even for real numbers, the underlying limit can depend on how you approach the limit. A superficial 1 / 0 form can go to positive or negative infinity. On the complex plane, things are even more complex (well). For example, you may run into branch cuts and (different kinds of) singularities. There's no universal solution that fits all.

Tl;dr: Fix the underlying problem in the face of ambiguous/incomplete/corrupted data, or prove that the end computational result can withstand such corruption (which can happen).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3