'add a column to data frame using pandas concatenation

I have "train_df" data frame which:

print(train_df.shape)

returns (997, 600).

now I want to concatenate a column to this data frame which:

print(len(local_df["target"]))

returns 997.

so it seems that everything is ok with the dimensions.

but the problem is that:

final_df = pd.concat([train_df, local_df["target"]], axis=1)
print(final_df.shape)

returns (1000, 601). while it should be (997, 601).

Do you know what is the problem?



Solution 1:[1]

You can assign a numpy array as a new column.

final_df = train_df.assign(target=local_df["target"].values)

For pandas >= 0.24,

final_df = train_df.assign(target=local_df["target"].to_numpy())

Solution 2:[2]

I think problem is with different index values, so solution is create same by reset_index with parameter drop=True:

final_df = pd.concat([train_df.reset_index(drop=True), 
                     local_df["target"].reset_index(drop=True)], axis=1)
print(final_df.shape)

Or set index of local_df by train_df.index:

final_df = pd.concat([train_df, 
                     local_df["target"].set_index(train_df.index)], axis=1)
print(final_df.shape)

Solution 3:[3]

How about join?:

import pandas as pd
df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
df2=pd.DataFrame({'c':[232,543,562]})
print(df.reset_index(drop=True).join(df2.reset_index(drop=True), how='left'))

Output:

   a  b    c
0  1  4  232
1  2  5  543
2  3  6  562

Solution 4:[4]

Not sure if this is most efficient

Adding a new column y to a dataframe df from another dataframe df2 which has this column y

 df = df.assign(y=df2["y"].reset_index(drop=True))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 jezrael
Solution 3 U12-Forward
Solution 4 Alex Punnen