'How create a column with list of jsons if duplicated rows on other column?

I have a Pandas dataframe looking like this:

buyer_id    car      color   year
john        ferrari  yellow  2022
eric        ferrari  red     2022
john        mercedes black   1990
victoria    audi     yellow  2017

I would like to create a new column (list of jsons in each row.

Create a column 'identical' with a list in each row:

  • One element in the list if only one buyer is found in 'buyer_id':

    [{'car':..., 'color':..., 'year': ...}]

  • If same buyer on several rows in 'buyer_id'

    [ {'car':'ferrari', 'color': 'yellow ', 'year': 2022}, {'car':'mercedes', 'color': 'black', 'year': 1990} ]

Expected output:

    buyer_id   car      color   year  identical
    john       ferrari  yellow  2022  [{'car':'ferrari', 'color': 'yellow ', 'year': 2022},{'car':'mercedes', 'color': 'black', 'year': 1990}]
    eric       ferrari  red     2022  [{'car':'ferrari', 'color': 'red', 'year': 2022}]
    john       mercedes black   1990  [[{'car':'ferrari', 'color': 'yellow ', 'year': 2022},{'car':'mercedes', 'color': 'black', 'year': 1990}]
    victoria   audi     yellow  2017  [{'car':'audi', 'color': 'yellow', 'year': 2017}]

I don't know how to do this with Pandas and if it is possible.



Solution 1:[1]

You could use GroupBy.apply and to_json with the orient="records" parameter:

s = (df.groupby('buyer_id')
       .apply(lambda g: g.drop('buyer_id', axis=1)
                         .to_json(orient='records'))
    )
df2 = df.merge(s.rename('identical'), left_on='buyer_id', right_index=True)

or in place:

s = (df.set_index('buyer_id')
       .groupby(level='buyer_id')
       .apply(lambda g: g.to_json(orient='records'))
    )
df['identical'] = df['buyer_id'].map(s)

output:

   buyer_id       car   color  year                                                                                        identical
0      john   ferrari  yellow  2022  [{"car":"ferrari","color":"yellow","year":2022},{"car":"mercedes","color":"black","year":1990}]
1      eric   ferrari     red  2022                                                    [{"car":"ferrari","color":"red","year":2022}]
2      john  mercedes   black  1990  [{"car":"ferrari","color":"yellow","year":2022},{"car":"mercedes","color":"black","year":1990}]
3  victoria      audi  yellow  2017                                                    [{"car":"audi","color":"yellow","year":2017}]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1