'Pandas - Duplicate Rows and Slice String

I'm trying to create duplicate rows during a dataframe on conditions.

For example, I have this Dataframe.

students = [
    ("a", "Ursula"),
    ("b", "Hayfa, Martin"),
    ("c", "Kato"),
    ("d", "Tanek, Ava, Pyto"),
    ("e", "Aiko"),
    ("f", "Hunter"),
    ("g", "Josiah, Derek, Uma, Nell"),
]
df = pd.DataFrame(students, columns=["team", "student"])
print(df)

team    student
 a      Ursula
 b      Hayfa, Martin
 c      Kato
 d      Tanek, Ava, Pyto
 e      Aiko
 f      Hunter
 g      Josiah, Derek, Uma, Nell

And I would like to get the following output:

  team  student                   name      remark
   a    Ursula                    Ursula    
   b    Hayfa, Martin             Hayfa     with Martin
   b    Hayfa, Martin             Martin    with Hayfa
   c    Kato                      Kato            
   d    Tanek, Ava, Pyto          Tanek     with Ava, Pyto
   d    Tanek, Ava, Pyto          Ava       with Tanek, Pyto
   d    Tanek, Ava, Pyto          Pyto      with Tanek, Ava
   e    Aiko                      Aiko            
   f    Hunter                    Hunter    
   g    Josiah, Derek, Uma, Nell  Josiah    with Derek, Uma, Nell
   g    Josiah, Derek, Uma, Nell  Derek     with Josiah, Uma, Nell
   g    Josiah, Derek, Uma, Nell  Uma       with Josiah, Derek, Nell
   g    Josiah, Derek, Uma, Nell  Nell      with Josiah, Derek, Uma

Solution 1:^[1]

For pandas 0.25+ is possible use DataFrame.explode with splitted values by Series.str.split and for remark column list comprehension with filtering:

students = df["student"].str.split(", ")
df = df.assign(name=students, remark=students).explode("name").reset_index(drop=True)
df["remark"] = [
    "with " + ", ".join(x for x in r if x != n) if len(r) > 1 else ""
    for n, r in zip(df["name"], df["remark"])
]

print (df)

And we get the following result:

   team                   student    name                    remark
0     a                    Ursula  Ursula                          
1     b             Hayfa, Martin   Hayfa               with Martin
2     b             Hayfa, Martin  Martin                with Hayfa
3     c                      Kato    Kato                          
4     d          Tanek, Ava, Pyto   Tanek            with Ava, Pyto
5     d          Tanek, Ava, Pyto     Ava          with Tanek, Pyto
6     d          Tanek, Ava, Pyto    Pyto           with Tanek, Ava
7     e                      Aiko    Aiko                          
8     f                    Hunter  Hunter                          
9     g  Josiah, Derek, Uma, Nell  Josiah     with Derek, Uma, Nell
10    g  Josiah, Derek, Uma, Nell   Derek    with Josiah, Uma, Nell
11    g  Josiah, Derek, Uma, Nell     Uma  with Josiah, Derek, Nell
12    g  Josiah, Derek, Uma, Nell    Nell   with Josiah, Derek, Uma

Solution 2:^[2]

Here is another way:

(df.assign(name = df['student'].str.split(','))
 .explode('name')
 .assign(remark = lambda x: (x['student'].str.split(', ').map(set) - x['name'].map(lambda y: set(y.split())))
         .map(lambda z: 'with {}'.format(' '.join(z)) if len(z)>0 else "")))

Output:

  team                   student     name                  remark
0    a                    Ursula   Ursula                        
1    b             Hayfa, Martin    Hayfa             with Martin
1    b             Hayfa, Martin   Martin              with Hayfa
2    c                      Kato     Kato                        
3    d          Tanek, Ava, Pyto    Tanek           with Ava Pyto
3    d          Tanek, Ava, Pyto      Ava         with Tanek Pyto
3    d          Tanek, Ava, Pyto     Pyto          with Tanek Ava
4    e                      Aiko     Aiko                        
5    f                    Hunter   Hunter                        
6    g  Josiah, Derek, Uma, Nell   Josiah     with Derek Nell Uma
6    g  Josiah, Derek, Uma, Nell    Derek    with Josiah Nell Uma
6    g  Josiah, Derek, Uma, Nell      Uma  with Derek Josiah Nell
6    g  Josiah, Derek, Uma, Nell     Nell   with Derek Josiah Uma

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Jean-Francois T.
Solution 2	rhug123

'Pandas - Duplicate Rows and Slice String

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]