'Python Pandas - Lookup a variable column depending on another column's value

I'm trying to use the value of one cell to find the value of a cell in another column. The first cell value ('source') dictates which column to lookup.

import pandas as pd

df = pd.DataFrame({'A': ['John', 'Andrew', 'Bob', 'Fred'], 'B': [
                  'Fred', 'Simon', 'Andrew', 'Andrew'], 'source': ['A', 'B', 'A', 'B']}, )

print(df)

        A       B source
0    John    Fred      A
1  Andrew   Simon      B
2     Bob  Andrew      A
3    Fred  Andrew      B

My required output value in the 'output' column is a lookup of the 'source':

        A       B source  output
0    John    Fred      A    John
1  Andrew   Simon      B   Simon
2     Bob  Andrew      A     Bob
3    Fred  Andrew      B  Andrew

Failed attempts

df['output'] = df[df['source']]

This results in a ValueError: Wrong number of items passed 4, placement implies 1 because the df['source'] passes in a Series, not a string. I tried converting to a string using:

df['output'] = df[df['source'].convertDTypes(convert_string=True)]

which gave error AttributeError: 'Series' object has no attribute 'convertDTypes'.

Working solution

I found a solution might by iterating through the rows using:

for index, row in df.iterrows():
    column = df.loc[index, 'source']
    df.at[index, 'output'] = df.loc[index, column]

However, this post suggests iterating is a bad idea. The code doesn't seem very elegant, either.

I feel I've missed something basic here; this really should not be that hard.



Solution 1:[1]

Let us do numpy way since lookup will not longer work in the future version

df['new'] = df.values[df.index,df.columns.get_indexer(df.source)]
df
Out[339]: 
        A       B source     new
0    John    Fred      A    John
1  Andrew   Simon      B   Simon
2     Bob  Andrew      A     Bob
3    Fred  Andrew      B  Andrew

Solution 2:[2]

Use numpy.where

df['output'] = np.where(df.source == 'A', df.A, df.B)

If you have more columns, use numpy.select

conditions = [df.source == 'A', df.source == 'B']
values = [df.A, df.B]
df['output'] = np.select(conditions, values)

Solution 3:[3]

Stack and then loc with multiindex for recent versions:

df['output'] = df.stack().loc[zip(df.index,df['source'])].droplevel(-1)

or:

df['output'] = (df.stack().loc[pd.MultiIndex.from_arrays((df.index,df['source']))]
                .droplevel(1))

For earlier versions of pandas:

df['output'] =  df.lookup(df.index,df['source'])

        A       B source  output
0    John    Fred      A    John
1  Andrew   Simon      B   Simon
2     Bob  Andrew      A     Bob
3    Fred  Andrew      B  Andrew

Solution 4:[4]

Try this:

df['output'] = df.apply(lambda x: x[x.source], axis=1)

Output:

    A         B source  output
0   John    Fred    A   John
1   Andrew  Simon   B   Simon
2   Bob     Andrew  A   Bob
3   Fred    Andrew  B   Andrew

Solution 5:[5]

You can also do this simply by enumerate() ,list comprehension and loc[] accessor

df['output']=[df.loc[x,y] for x,y in enumerate(df['source'])]

Now If you print df you will get your desired output:

      A     B       source      output
0   John    Fred        A       John
1   Andrew  Simon       B       Simon
2   Bob     Andrew      A       Bob
3   Fred    Andrew      B       Andrew

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 BENY
Solution 2
Solution 3
Solution 4 ashkangh
Solution 5 Anurag Dabas