'Merging two dataframes without losing data

I have two dataframes:

df_1 = 
Material    TypeOf      
4100          N200         
4101          M200         
4200          M200     
4500          N200    
...

df_2=
Material       Ms      Number1      Number2 ...
4000          PAK            1            1
4100          PAK            1            2
4101          PIE            2            2
4101          BOX            3            2
4102          BOX            2            2
4200          PAK            1            1
4200          BOX            2            3
...

I want to add to the second table the TypeOf, which can obviously achieved by just merging the two on Material. But there are lots of Materials that cannot be found in the second table, and I do not want to end up with half of the second dataframe. Is there any way of still adding the desired column, merging the two, but putting something like NaN for the records whose TypeOf cannot be found?



Solution 1:[1]

In the join documentation, there's a how parameter :

how{‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’
How to handle the operation of the two objects.

left: use calling frame’s index (or column if on is specified)

right: use other’s index.

outer: form union of calling frame’s index (or column if on is specified) with other’s index, and sort it. lexicographically.

inner: form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one.

cross: creates the cartesian product from both frames, preserves the order of the left keys.

What pandas does on default is the left join, types of sql joins.

Meaning that it matches the left dataframe's column elements but discards when there are no matches.

In your case, you might want an outer join. Matches in the columns will be next to each other but you'll discard no rows from either dataframe.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nathan Furnal