'Percentage difference between any two columns of pandas dataframe

I would like to have a function defined for percentage diff calculation between any two pandas columns. Lets say that my dataframe is defined by:

R1  R2    R3    R4   R5    R6
 A   B     1     2    3     4

I would like my calculation defined as

df['R7'] = df[['R3','R4']].apply( method call to calculate perc diff)

and

df['R8'] = df[['R5','R6']].apply(same method call to calculate perc diff)

How can i do that?

I have tried below

df['perc_cnco_error'] = df[['CumNetChargeOffs_x','CumNetChargeOffs_y']].apply(lambda x,y: percCalc(x,y))

def percCalc(x,y):
    if x<1e-9:
        return 0
    else:
        return (y - x)*100/x

and it gives me the error message

TypeError: ('() takes exactly 2 arguments (1 given)', u'occurred at index CumNetChargeOffs_x')



Solution 1:[1]

At it's simplest terms:

def percentage_change(col1,col2):
    return ((col2 - col1) / col1) * 100

You can apply it to any 2 columns of your dataframe:

df['a'] = percentage_change(df['R3'],df['R4'])    
df['b'] =  percentage_change(df['R6'],df['R5'])

>>> print(df)
 
  R1 R2  R3  R4  R5  R6      a     b
0  A  B   1   2   3   4  100.0 -25.0

Equivalently using pandas arithmetic operation functions

def percentage_change(col1,col2):
    return ((col2.sub(col1)).div(col1)).mul(100)

You can also utilise pandas built-in pct_change which computes the percentage change across all the columns passed, and select the column you want to return:

df['R7'] = df[['R3','R4']].pct_change(axis=1)['R4']
df['R8'] = df[['R6','R5']].pct_change(axis=1)['R5']

>>> print(df)

  R1 R2  R3  R4  R5  R6      a     b   R7    R8
0  A  B   1   2   3   4  100.0 -25.0  1.0 -0.25

Setup:

df = pd.DataFrame({'R1':'A','R2':'B',
                   'R3':1,'R4':2,'R5':3,'R6':4},
                  index=[0])

Solution 2:[2]

To calculate percent diff between R3 and R4 you can use:

df['R7'] = (df.R3 - df.R4) / df.R3 * 100

Solution 3:[3]

This would give you the deviation in percentage:

df.apply(lambda row: (row.iloc[0]-row.iloc[1])/row.iloc[0]*100, axis=1)

If you have more than two columns try,

df[['R3', 'R5']].apply(lambda row: (row.iloc[0]-row.iloc[1])/row.iloc[0]*100, axis=1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Daniil Mashkin
Solution 3 pdubucq