'apply function in pandas to create two columns
I have a Pandas DataFrame called ebola
as seen below. variable
column has two pieces of information status
whether it is Cases or Deaths and country
which consists of country names. I try to create two new columns status
and country
out of that variable
column by using .apply()
function. However, since there are two values I am trying to extract, it does not work.
# let's create a splitter function
def splitter(column):
status, country = column.split("_")
return status, country
# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].apply(splitter)
The error I get is
ValueError: Must have equal len keys and value when setting with an iterable
I want my output to be like this
Solution 1:[1]
Solution 2:[2]
This is very late post to original question. Thanks to @ansev, the solution was great and it worked out great. While I was going through my question, I was trying to develop a solution based on my first approach. I was able to work it out and I wanted to share for anyone who might want to see a different perspective on this.
update to my code:
# let's create a splitter function
def splitter(column):
for row in column:
status, country = row.split("_")
return status, country
# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].to_frame().apply(splitter, axis=1, result_type='expand')
Two updates to my code, so it could work.
- Instead of going through Series, I converted it to dataframe using
.to_frame()
method. - In my
splitter
function, I had to iterate through each row since it was a DataFrame. Therefore, I addedfor row in column
line.
To replicate all of this:
import numpy as np
import pandas as pd
# create the data
ebola_dict = {'Date':['3/24/2014', '3/22/2014', '1/15/2015', '1/4/2015'],
'variable': ['Cases_Guinea', 'Cases_Guinea', 'Cases_Liberia', 'Cases_Liberia']}
ebola = pd.DataFrame(ebola_dict)
print(ebola)
# let's create a splitter function
def splitter(column):
for row in column:
status, country = row.split("_")
return status, country
# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].to_frame().apply(splitter, axis=1, result_type='expand')
# check if it worked
print(ebola)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | ansev |
Solution 2 | mmustafaicer |