'Rolling Gradient for Pandas Dataframe column
How can I create a column in a pandas dataframe with is the gradient of another column?
I want the gradient to be run over a rolling window, so only 4 data points are assessed at one time.
I am assuming it is something like:
df['Gradient'] = np.gradient(df['Yvalues'].rolling(center=False,window=4))
However this gives error:
raise ValueError('Length of values does not match length of ' 'index')
ValueError: Length of values does not match length of index
Any ideas?
Thank you!!
Solution 1:[1]
From the given information, it can be seen that you haven't provided an aggregation function to your rolling window.
df['Gradient'] = np.gradient(
df['Yvalues']
.rolling(center=False, window=4)
.mean()
)
or
df['Gradient'] = np.gradient(
df['Yvalues']
.rolling(center=False, window=4)
.sum()
)
You can read more about rolling functions at this website.
Solution 2:[2]
I think I found the solution. Though it's probably not the most efficient..
class lines(object):
def __init__(self):
pass
def date_index_to_integer_axis(self, dateindex):
d = [d.date() for d in dateindex]
days = [(d[x] - d[x-1]).days for x in range(0,len(d))]
axis = np.cumsum(days)
axis = [x - days[0] for x in axis]
return axis
def roll(self, Xvalues, Yvalues, w): # Rollings Generator Function # https://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do-in-python
for i in range(len(Xvalues) + 1 - w):
yield Xvalues[i:i + w], Yvalues[i:i + w]
def gradient(self,Xvalues,Yvalues):
#Uses least squares method.
#Returns the gradient of two array vectors (https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.lstsq.html)
A = np.vstack([Xvalues, np.ones(len(Xvalues))]).T
m, c = np.linalg.lstsq(A, Yvalues)[0]
return m,c
def gradient_column(self, data, window):
""" Takes in a single COLUMN EXTRACT from a DATAFRAME (with associated "DATE" index) """
vars = variables()
#get "X" values
Xvalues = self.date_index_to_integer_axis(data.index)
Xvalues = np.asarray(Xvalues,dtype=np.float)
#get "Y" values
Yvalues = np.asarray([val for val in data],dtype=np.float)
Yvalues = np.asarray(Yvalues,dtype=np.float)
#calculate rolling window "Gradient" ("m" in Y = mx + c)
Gradient_Col = [self.gradient(sx,sy)[0] for sx,sy in self.roll(Xvalues,Yvalues, int(window))]
Gradient_Col = np.asarray(Gradient_Col,dtype=np.float)
nan_array = np.empty([int(window)-1])
nan_array[:] = np.nan
#fill blanks at the start of the "Gradient_Col" so it is the same length as the original Dataframe (its shorter due to WINDOW)
Gradient_Col = np.insert(Gradient_Col, 0, nan_array)
return Gradient_Col
df['Gradient'] = lines.gradient_column(df['Operating Revenue'],window=4)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Jaroslav Bezděk |
Solution 2 | Jaroslav Bezděk |