'pandas diff() giving 0 value for first difference, I want the actual value instead

I have df:

Hour  Energy Wh  
1        4          
2        6           
3        9
4        15

I would like to add a column that shows the per hour difference. I am using this:

df['Energy Wh/h'] = df['Energy Wh'].diff().fillna(0)

df1:

Hour  Energy Wh  Energy Wh/h
1        4          0
2        6          2 
3        9          3
4        15         6

However, the Hour 1 value is showing up as 0 in the Energy Wh/h column, whereas I would like it to show up as 4, like below:

Hour  Energy Wh  Energy Wh/h
1        4          4
2        6          2 
3        9          3
4        15         6

I have tried using np.where:

df['Energy Wh/h']  = np.where(df['Hour'] == 1,df['Energy Wh'].diff().fillna(df['Energy Wh']),df['Energy Wh'].diff().fillna(0))

but I am still getting a 0 value in the hour 1 row (df1), with no errors. How do I get the value in 'Energy Wh' for Hour 1 to be filled, instead of 0?



Solution 1:[1]

You can just fillna() with the original column, without using np.where:

>>> df['Energy Wh/h'] = df['Energy Wh'].diff().fillna(df['Energy Wh'])
>>> df
      Energy Wh  Energy Wh/h
Hour
   1          4          4.0
   2          6          2.0
   3          9          3.0
   4         15          6.0

Solution 2:[2]

The first value returned by diff() is always NaN, so it's faster to replace only this value instead of needing to use fillna(), just using loc[]. Then convert the float dtype (introduced by the NaN) back down to int by astype (if necessary; i.e. if your series doesn't have any other NaNs or float values):

df['Energy W/h'] = df['Energy Wh'].diff()
df.loc[0, 'Energy W/h'] = df['Energy Wh'].iloc[0]
df['Energy W/h'] = df['Energy W/h'].astype(int)
print (df)
   Hour  Energy Wh  Energy W/h
0     1          4           4
1     2          6           2
2     3          9           3
3     4         15           6

More general is to do:

df.index = [5,6,7,8]
print (df)
   Hour  Energy Wh
5     1          4
6     2          6
7     3          9
8     4         15

df['Energy W/h'] = df['Energy Wh'].diff()
df.loc[df.index[0], 'Energy W/h'] = df['Energy Wh'].iloc[0]
df['Energy W/h'] = df['Energy W/h'].astype(int)

print (df)
   Hour  Energy Wh  Energy W/h
5     1          4           4
6     2          6           2
7     3          9           3
8     4         15           6

Solution 3:[3]

An alternative approach is to avoid .diff and use .shift with a fill value and explicitly perform the subtraction:

df['Energy Wh/h'] =  df['Energy Wh'] - df['Energy Wh'].shift(1, fill_value=0)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 smci
Solution 3 juanpa.arrivillaga