'Is there a way to calculate a new column of a dataframe on base of values in a(nother) lookup table

Is it possible to calculate and add the results to an existing dataframe while the calculation bases on a lookup table with a different format?

More precise:

I have a comma separated file that I open and read in as pandas dataframe. The read-in dataframe contains 38 different data columns and an (while read in process created) additional index column over several thousand rows:

eSection of dataframe "data"

My lookup table contains values as base for a calculation. As well, it is a comma separated file read in as pandas dataframe. It contains 24 rows and 6 columns and an additional index column:

Lookup table

And here comes the calculation which I try to realize: In a new column "M_A" I want to write the result of a calculation like this:

Calculation formula

while i stands for the according values of C00, C01, C02....C22, C23

While SP, FR, C00, C01, C02 [...] are a column part of the "data" dataframe, PV, W and RC_A are part of the lookup table dataframe. Common indexing parameter of the "data" and the "lookup" tables are values of the colums of C00, C01, C02 according to column "C" of the lookup table. Calculation values should be taken when data column C00, C01, C02... match lookup table row C00, C01, C02...

As iteration is not a recommended solution for datasets of this size I tried it without but do not find the right way as my lookup table has not the same length as my data table.

df_data['A_calc'] = ((df_data.T / (df_data.SF * df_data.SP)) * ((df_data.C00 * df_lookup.PV * df_lookup.W * df_lookup.RC_A) + (df_data.C01 * df_lookup.PV * df_lookup.W * df_lookup.RC_A) + (df_data.C02 * df_lookup.PV * df_lookup.W * df_lookup.RC_A)+ ...)

This leads to the Error message:

AttributeError: 'DataFrame' object has no attribute 'PTU_Airdensity_recalc'

Is there a way to realize this in Python with Pandas df? Maybe even more elegant than mine which I choose to visualize what my intention is...

Any suggestions?

Thanks, Swawa



Solution 1:[1]

So for my understanding to apply the formula; for each column Ci we multiply it with values PV[i], W[i],RC_A[i] then sum over each result

result=0

for i in range(len(df_lookup)):
   result=result+(df_data[df_lookup.loc[i,"C"]]*df_lookup.PV.iloc[i] * 
   df_lookup.W.iloc[i] * df_lookup.RC_A.iloc[i])

#result is a  column

#then we multiply element wise

df_data['A_calc'] = ((df_data.T / (df_data.SF * df_data.SP))*multiply(result, axis="index")

Solution 2:[2]

This version is working now. Thanks a lot to the help of Ran A!

result=0

for i in range(len(df_lookup)):
   result=result+(df_data[df_lookup.loc[i,"C"]]*df_lookup.PV.iloc[i] * 
   df_lookup.W.iloc[i] * df_lookup.RC_A.iloc[i])
   #result is a  column

#then we multiply element wise

df_data['A_calc'] = ((df_data.T / (df_data.SF * df_data.SP)).multiply(result, axis="index")

this line is working then with "." instead of "*". But the loop I am still working on...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2