'how to merge Two datasets with different time ranges?

I have two datasets that look like this:

df1:

Date City State Quantity
2019-01 Chicago IL 35
2019-01 Orlando FL 322
... .... ... ...
2021-07 Chicago IL 334
2021-07 Orlando FL 4332

df2:

Date City State Sales
2020-03 Chicago IL 30
2020-03 Orlando FL 319
... ... ... ...
2021-07 Chicago IL 331
2021-07 Orlando FL 4000

My date is in format period[M] for both datasets. I have tried using the df1.join(df2,how='outer') and (df2.join(df1,how='outer') commands but they don't add up correctly, essentially, in 2019-01, I have sales for 2020-03. How can I join these two datasets such that my output is as follows:

I have not been able to use merge() because I would have to merge with a combination of City and State and Date

Date City State Quantity Sales
2019-01 Chicago IL 35 NaN
2019-01 Orlando FL 322 NaN
... ... ... ... ...
2021-07 Chicago IL 334 331
2021-07 Orlando FL 4332 4000


Solution 1:[1]

You can outer-merge. By not specifying the columns to merge on, you merge on the intersection of the columns in both DataFrames (in this case, Date, City and State).

out = df1.merge(df2, how='outer').sort_values(by='Date')

Output:

      Date     City State  Quantity   Sales
0  2019-01  Chicago    IL      35.0     NaN
1  2019-01  Orlando    FL     322.0     NaN
4  2020-03  Chicago    IL       NaN    30.0
5  2020-03  Orlando    FL       NaN   319.0
2  2021-07  Chicago    IL     334.0   331.0
3  2021-07  Orlando    FL    4332.0  4000.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1