'How to create a new columns based off of values of other columns which could contain #s or NaN?

I have a few dataframes that I'm merging based on known, populated fields. The resulting dataframe will always contain a set of columns, but may or may not have values for some of the columns.

Here is a snippet:

    df = df.merge(
        how="left",
        right=ins_df,
        left_on=["warehouse", "date"],
        right_on=["ware_id", "c_date"],
    ).merge(
        how="left",
        right=ware_df,
        left_on=["warehouse", "date"],
        right_on=["warehouse_code", "warehouse_date"],
    )

I get a resulting dataframe of several columns, let's just call them A, B, C, D, and E.

I need to create new columns, F, G, and H. F needs to be calculated like A + B - C and G needs to be calculated based on the values of max(A + B + C, E - D), and H needs to be C * D etc.

This would be relatively simple, but I am running into a hiccup because I don't know how I should handle NaNs. When we merge, sometimes there are no values in the ins_df or ware_df to use and we get NaN values in the df. I can't do a df-wide default of NaN values because for some calculations, we would want to treat an NaN as a 0, for example, while in other calculations, we want to treat the NaN as -1 or not even populate a field if there is NaN for one of the fields (for example for H, if C or D is NaN, we don't want to calculate H. But for G, if C is NaN, we want to treat it as 0)

Is there a simple way to do this in my calculations? For example, something like this

df['G'] = max(df.get('A', default=0) + df.get('B', default=1) + df.get('C', default=0), df.get('E', default=-1))

df['H'] = df['C'] * df['D'] if ['C', 'D'] in df.columns else 0


Solution 1:[1]

With the following toy dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        "A": [8, 5, 2],
        "B": [4, pd.NA, 3],
        "C": [1, 1, 8],
        "D": [pd.NA, 7, 2],
        "E": [5, 5, pd.NA],
    }
)

Here is one way to do it:

# For G, if A is NaN, we want to treat it as 0, etc.
df["G"] = pd.concat(
    [df["A"].fillna(0) + df["B"].fillna(1) + df["C"].fillna(0), df["E"].fillna(-1)],
    axis=1,
).max(axis=1)

# For H, if C or D is NaN, we don't want to calculate H
df["H"] = df["C"] * df["D"]
print(df)
# Output
   A     B  C     D     E   G     H
0  8     4  1  <NA>     5  13  <NA>
1  5  <NA>  1     7     5   7     7
2  2     3  8     2  <NA>  13    16

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Laurent