'pandas row values to column headers
I have a daraframe like this
df = pd.DataFrame({'id1':[1,1,1,1,2,2,2],'id2':[1,1,1,1,2,2,2],'value':['a','b','c','d','a','b','c']})
id1 id2 value
0 1 1 a
1 1 1 b
2 1 1 c
3 1 1 d
4 2 2 a
5 2 2 b
6 2 2 c
I need to transform into this form
id1 id2 a b c d
0 1 1 1 1 1 1
1 2 2 1 1 1 0
There can be any number of levels in the value variables for each id ranging from 1 to 10. if the level is not present for that id it should be 0 else 1.
I am using anaconda python 3.5, windows 10
Solution 1:[1]
If need output 1
and 0
only for presence of value
:
You can use get_dummies
with Series
created by set_index
, but then is necessary groupby
+ GroupBy.max
:
df = pd.get_dummies(df.set_index(['id1','id2'])['value'])
.groupby(level=[0,1])
.max()
.reset_index()
print (df)
id1 id2 a b c d
0 1 1 1 1 1 1
1 2 2 1 1 1 0
Another solution with groupby
, size
and unstack
, but then is necesary compare with gt
and convert to int
by astype
. Last reset_index
and rename_axis
:
df = df.groupby(['id1','id2', 'value'])
.size()
.unstack(fill_value=0)
.gt(0)
.astype(int)
.reset_index()
.rename_axis(None, axis=1)
print (df)
id1 id2 a b c d
0 1 1 1 1 1 1
1 2 2 1 1 1 0
If need count value
s:
df = pd.DataFrame({'id1':[1,1,1,1,2,2,2],
'id2':[1,1,1,1,2,2,2],
'value':['a','b','a','d','a','b','c']})
print (df)
id1 id2 value
0 1 1 a
1 1 1 b
2 1 1 a
3 1 1 d
4 2 2 a
5 2 2 b
6 2 2 c
df = df.groupby(['id1','id2', 'value'])
.size()
.unstack(fill_value=0)
.reset_index()
.rename_axis(None, axis=1)
print (df)
id1 id2 a b c d
0 1 1 2 1 0 1
1 2 2 1 1 1 0
Or:
df = df.pivot_table(index=['id1','id2'], columns='value', aggfunc='size', fill_value=0)
.reset_index()
.rename_axis(None, axis=1)
print (df)
id1 id2 a b c d
0 1 1 2 1 0 1
1 2 2 1 1 1 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |