'How to turn the item in the column to multiple columns?

I am trying to do this. So, currently my df look like this.

col_names = ['movie_id', 'movie_title', 'genres']
df = pd.read_csv('/content/drive/MyDrive/testing/ml-1m/movies.csv', sep='::', names=col_names, encoding='latin-1')

1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
5::Father of the Bride Part II (1995)::Comedy
6::Heat (1995)::Action|Crime|Thriller
7::Sabrina (1995)::Comedy|Romance
8::Tom and Huck (1995)::Adventure|Children's
9::Sudden Death (1995)::Action
10::GoldenEye (1995)::Action|Adventure|Thriller

while the columns are such "movie_id, name, and genre".

I want it to look like this.

The columns would be movie_id, name,Action,Adventure,Animation,Children's,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western


1::Toy Story (1995)::0::0::1::1::1::0::0::0::0::0::0::0::0::0::0::0::0::0::0
2::Jumanji (1995)::0::1::0::1::0::0::0::0::1::0::0::0::0::0::0::0::0::0::0
.
.
...

Basically, I want to turn genre column to multiple columns and 1 will be there if it match the columns.

Is there anyway to do this with pandas?



Solution 1:[1]

IIUC you can do it like that:

res = (df
       .join(df['genre'].str.get_dummies())
       .drop('genre',axis=1))

print(res)

print(res)

   movie_id                                name  Action  Adventure  Animation  Children's  Comedy  Crime  Drama  Fantasy  Romance  Thriller
0         1                    Toy Story (1995)       0          0          1           1       1      0      0        0        0         0
1         2                      Jumanji (1995)       0          1          0           1       0      0      0        1        0         0
2         3             Grumpier Old Men (1995)       0          0          0           0       1      0      0        0        1         0
3         4            Waiting to Exhale (1995)       0          0          0           0       1      0      1        0        0         0
4         5  Father of the Bride Part II (1995)       0          0          0           0       1      0      0        0        0         0
5         6                         Heat (1995)       1          0          0           0       0      1      0        0        0         1
6         7                      Sabrina (1995)       0          0          0           0       1      0      0        0        1         0
7         8                 Tom and Huck (1995)       0          1          0           1       0      0      0        0        0         0
8         9                 Sudden Death (1995)       1          0          0           0       0      0      0        0        0         0
9        10                    GoldenEye (1995)       1          1          0           0       0      0      0        0        0         1

If you don't want to delete the original genre column, just remove the last part with drop

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1