'resampling raises ValueError: Values falls before first bin
I don't understand when and why this error is raised.
From my understanding, resample
should create as many bins as needed in order to bin all the timestamps of the index. So the message "Values falls before first bin" does not make much sense to me.
Example/actual output:
>>> df = pd.DataFrame(index=pd.date_range(start='2021-04-22 01:00:00', end='2021-04-28 01:00', freq='1d'), data = [1]*7)
>>> df
0
2021-04-22 01:00:00 1
2021-04-23 01:00:00 1
2021-04-24 01:00:00 1
2021-04-25 01:00:00 1
2021-04-26 01:00:00 1
2021-04-27 01:00:00 1
2021-04-28 01:00:00 1
>>> df.resample(rule='7d', origin='2021-04-29 00:00:00', closed='right', label='right').sum()
[...]
ValueError: Values falls before first bin
Expected output:
>>> df.resample(rule='7d', origin='2021-04-29 00:00:00', closed='right', label='right').sum()
0
2021-04-29 7 # bin (2021-04-22 00:00:00, 2021-04-29 00:00:00]
I'm using pandas
1.3.5
Solution 1:[1]
From this question I learned that the timestamps are likely truncated with respect to the unit given in the rule
argument before they are sorted into the correct bin.
This means that
2021-04-22 01:00:00
is rounded to2021-04-22 00:00:00
2021-04-22 00:00:00
does not fit into the bin(2021-04-22 00:00:00, 2021-04-29 00:00:00]
which leads to theValueError
To my eyes this looks like a bug or misfeature. At least one of "truncate timestamps before binning" or "don't add bins as needed, instead raise error" seems to be wrong.
Solution 2:[2]
I found time = time.dt.normalize()
to help
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | actual_panda |
Solution 2 | Hanan Shteingart |