'Python, Pandas ; ValueError('window must be an integer',)

I seem to be having this issue with Pandas code inside a Bokeh callback.

Here's part of the output before the error. My dataframe seems normal and I'm not sure why it's upset

                     time  temperature
0 2016-03-17 11:00:00        4.676
1 2016-03-17 11:30:00        4.633
2 2016-03-17 12:00:00        4.639
3 2016-03-17 12:30:00        4.603
4 2016-03-17 13:00:00        4.615
5 2016-03-17 13:30:00        4.650
6 2016-03-17 14:00:00        4.678
7 2016-03-17 14:30:00        4.698
8 2016-03-17 15:00:00        4.753
9 2016-03-17 15:30:00        4.847
ERROR:bokeh.server.protocol_handler:error handling message Message 'PATCH-DOC' (
revision 1): ValueError('window must be an integer',)

And here's the code I changed from the flask embed example (link here):

def callback(attr, old, new):
        df = pd.DataFrame.from_dict(source.data.copy())
        print df[:10]
        if new == 0:
            data = df
        else:
            data = df.rolling('{0}D'.format(new)).mean()
        source.data = ColumnDataSource(data=data).data

    slider = Slider(start=0, end=30, value=0, step=1, title="Smoothing by N Days")
    slider.on_change('value', callback)

I can also include the full code if that help, but the main change I have is just a doc.add_periodic_callback() that fetches new data periodically.



Solution 1:[1]

This is an error from Pandas. You are passing a string to df.rolling, but it expects only integer values. You probably want to pass int(new) instead.

Edit: as noted below, evidently the Pandas documentation is incomplete, and the real ultimate problem in this case is probably the lack of a time index, since creating a naive Dataframe and passing values like "10d" definitely raises the indicated error:

In [2]: df = pd.DataFrame({'B': [0, 1, 2, 10, 4]})

In [3]: df.rolling('10d')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-2a9875316cd7> in <module>
----> 1 df.rolling('10d')

~/anaconda/lib/python3.7/site-packages/pandas/core/generic.py in rolling(self, window, min_periods, center, win_type, on, axis, closed)
   8906                                    min_periods=min_periods,
   8907                                    center=center, win_type=win_type,
-> 8908                                    on=on, axis=axis, closed=closed)
   8909
   8910         cls.rolling = rolling

~/anaconda/lib/python3.7/site-packages/pandas/core/window.py in rolling(obj, win_type, **kwds)
   2467         return Window(obj, win_type=win_type, **kwds)
   2468
-> 2469     return Rolling(obj, **kwds)
   2470
   2471

~/anaconda/lib/python3.7/site-packages/pandas/core/window.py in __init__(self, obj, window, min_periods, center, win_type, axis, on, closed, **kwargs)
     78         self.win_freq = None
     79         self.axis = obj._get_axis_number(axis) if axis is not None else None
---> 80         self.validate()
     81
     82     @property

~/anaconda/lib/python3.7/site-packages/pandas/core/window.py in validate(self)
   1476
   1477         elif not is_integer(self.window):
-> 1478             raise ValueError("window must be an integer")
   1479         elif self.window < 0:
   1480             raise ValueError("window must be non-negative")

ValueError: window must be an integer

Solution 2:[2]

df.rolling can also handle time periods. Make sure the date time is in pandas format. If not, convert as such -

data['col'] = pd.to_datetime(data['col'])

Solution 3:[3]

As of today, the documentation states as follows:

window : int, or offset

Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.

If its an offset then this will be the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes. This is new in 0.19.0

It is not clear for me whether the time information in your dataframe is a column or part of a MultiIndex. For the first case, you can use .set_index('time').

For MultiIndex, currently, you cannot use offsets. See the related issue. Instead, you can use .reset_index() to transform it into a single index dataframe (see here).

Update: you can also pass datetime columns for offset-based rolling metrics with the on parameter (and, therefore, you do not have to have them in the index).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 William Miller
Solution 3