'apply function in pandas to create two columns

I have a Pandas DataFrame called ebola as seen below. variable column has two pieces of information status whether it is Cases or Deaths and country which consists of country names. I try to create two new columns status and country out of that variable column by using .apply() function. However, since there are two values I am trying to extract, it does not work.

ebola dataframe

# let's create a splitter function
def splitter(column):
    status, country = column.split("_")
    return status, country

# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].apply(splitter)

The error I get is

ValueError: Must have equal len keys and value when setting with an iterable

I want my output to be like this

enter image description here



Solution 1:[1]

Use Series.str.split

ebola[['status','country']]=ebola['variable'].str.split(pat='_',expand=True)

Solution 2:[2]

This is very late post to original question. Thanks to @ansev, the solution was great and it worked out great. While I was going through my question, I was trying to develop a solution based on my first approach. I was able to work it out and I wanted to share for anyone who might want to see a different perspective on this.

update to my code:

# let's create a splitter function
def splitter(column):
    for row in column:
        status, country = row.split("_")
        return status, country

# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].to_frame().apply(splitter, axis=1, result_type='expand')

Two updates to my code, so it could work.

  1. Instead of going through Series, I converted it to dataframe using .to_frame() method.
  2. In my splitter function, I had to iterate through each row since it was a DataFrame. Therefore, I added for row in column line.

To replicate all of this:

import numpy as np
import pandas as pd

# create the data
ebola_dict = {'Date':['3/24/2014', '3/22/2014', '1/15/2015', '1/4/2015'],
              'variable': ['Cases_Guinea', 'Cases_Guinea', 'Cases_Liberia', 'Cases_Liberia']}
ebola = pd.DataFrame(ebola_dict)
print(ebola)

# let's create a splitter function
def splitter(column):
    for row in column:
        status, country = row.split("_")
        return status, country

# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].to_frame().apply(splitter, axis=1, result_type='expand')

# check if it worked
print(ebola)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ansev
Solution 2 mmustafaicer