'Merging two dataframes without losing data
I have two dataframes:
df_1 =
Material TypeOf
4100 N200
4101 M200
4200 M200
4500 N200
...
df_2=
Material Ms Number1 Number2 ...
4000 PAK 1 1
4100 PAK 1 2
4101 PIE 2 2
4101 BOX 3 2
4102 BOX 2 2
4200 PAK 1 1
4200 BOX 2 3
...
I want to add to the second table the TypeOf
, which can obviously achieved by just merging the two on Material
. But there are lots of Materials
that cannot be found in the second table, and I do not want to end up with half of the second dataframe. Is there any way of still adding the desired column, merging the two, but putting something like NaN for the records whose TypeOf
cannot be found?
Solution 1:[1]
In the join documentation, there's a how
parameter :
how{‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’
How to handle the operation of the two objects.
left: use calling frame’s index (or column if on is specified)
right: use other’s index.
outer: form union of calling frame’s index (or column if on is specified) with other’s index, and sort it. lexicographically.
inner: form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one.
cross: creates the cartesian product from both frames, preserves the order of the left keys.
What pandas does on default is the left join, .
Meaning that it matches the left dataframe's column elements but discards when there are no matches.
In your case, you might want an outer join. Matches in the columns will be next to each other but you'll discard no rows from either dataframe.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Nathan Furnal |