'Why does Series.min(skipna=True) throws an error caused by na value?

I work with timestamps (having mixed DST values). Tried in Pandas 1.0.0:

s = pd.Series(
    [pd.Timestamp('2020-02-01 11:35:44+01'),
    np.nan, # same result with pd.Timestamp('nat')
    pd.Timestamp('2019-04-13 12:10:20+02')])

Asking for min() or max() fails:

s.min(), s.max() # same result with s.min(skipna=True)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 11216, in stat_func
    f, name, axis=axis, skipna=skipna, numeric_only=numeric_only
  File "C:\Anaconda\lib\site-packages\pandas\core\series.py", line 3892, in _reduce
    return op(delegate, skipna=skipna, **kwds)
  File "C:\Anaconda\lib\site-packages\pandas\core\nanops.py", line 125, in f
    result = alt(values, axis=axis, skipna=skipna, **kwds)
  File "C:\Anaconda\lib\site-packages\pandas\core\nanops.py", line 837, in reduction
    result = getattr(values, meth)(axis)
  File "C:\Anaconda\lib\site-packages\numpy\core\_methods.py", line 34, in _amin
    return umr_minimum(a, axis, None, out, keepdims, initial, where)
TypeError: '<=' not supported between instances of 'Timestamp' and 'float'

Workaround:

s.loc[s.notna()].min(), s.loc[s.notna()].max()

(Timestamp('2019-04-13 12:10:20+0200', tz='pytz.FixedOffset(120)'), Timestamp('2020-02-01 11:35:44+0100', tz='pytz.FixedOffset(60)'))

What I am missing here? Is it a bug?



Solution 1:[1]

I think problem here is pandas working with Series with different timezones like objects, so max and min here failed.

s = pd.Series(
    [pd.Timestamp('2020-02-01 11:35:44+01'),
    np.nan, # same result with pd.Timestamp('nat')
    pd.Timestamp('2019-04-13 12:10:20+02')])
print (s)
0    2020-02-01 11:35:44+01:00
1                          NaN
2    2019-04-13 12:10:20+02:00
dtype: object

So if convert to datetimes (but not with mixed timezones) it working well:

print (pd.to_datetime(s, utc=True))
0   2020-02-01 10:35:44+00:00
1                         NaT
2   2019-04-13 10:10:20+00:00
dtype: datetime64[ns, UTC]

print (pd.to_datetime(s, utc=True).max())
2020-02-01 10:35:44+00:00

Another possible solution if need different timezones is:

print (s.dropna().max())
2020-02-01 11:35:44+01:00

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jezrael