'how to assign an entire list to each row of a pandas dataframe
I have a dataframe and a list
df = pd.DataFrame({'A':[1,2,3], 'B':[4,5,6]})
mylist= [10,20,30,40,50]
I would like to have a list as element in each row of a dataframe. If I do like here,
df['C'] = mylist
Pandas is trying to broadcast one value per row, so I get an error Length of values does not match length of index
.
A B C
0 1 4 [10,20,40,50]
1 2 5 [10,20,40,50]
2 3 6 [10,20,40,50]
Solution 1:[1]
First I think working with list
s in pandas is not good idea.
But it is possible by list comprehension:
df['C'] = [mylist for i in df.index]
#another solution
#df['C'] = pd.Series([mylist] * len(df))
print (df)
A B C
0 1 4 [10, 20, 30, 40, 50]
1 2 5 [10, 20, 30, 40, 50]
2 3 6 [10, 20, 30, 40, 50]
Solution 2:[2]
One alternative using np.tile
:
df['C'] = np.tile(mylist, (len(df),1)).tolist()
print (df)
A B C
0 1 4 [10, 20, 30, 40, 50]
1 2 5 [10, 20, 30, 40, 50]
2 3 6 [10, 20, 30, 40, 50]
?
Solution 3:[3]
Here is another solution. It makes use of lambda
and do things "Pythonically". I think it is easier to read.
import pandas as pd
df = pd.DataFrame({'A':[1,2,3], 'B':[4,5,6]})
mylist= [10,20,30,40,50]
df['combined'] = df.apply(lambda x: mylist, axis=1)
df
Solution 4:[4]
Just to complete my earlier answer with df.assign, borrowed list comprehension from @jezrael
>>> df
A B
0 1 4
1 2 5
2 3 6
>>> df.assign(C = [mylist for i in df.index])
A B C
0 1 4 [10, 20, 30, 40, 50]
1 2 5 [10, 20, 30, 40, 50]
2 3 6 [10, 20, 30, 40, 50]
OR, to add permanently to the DataFrame
df = df.assign(C = [mylist for i in df.index])
Another way of doing it with df.insert
as we are specifying the order of the column, hence can use insert here by inserting at index 2 (so should be third col in dataframe)
>>> df.insert(2, 'C', '[10, 20, 30, 40, 50]') # directly assigning the list
>>> df
A B C
0 1 4 [10, 20, 30, 40, 50]
1 2 5 [10, 20, 30, 40, 50]
2 3 6 [10, 20, 30, 40, 50]
Solution 5:[5]
I agree with @jezrael, that working with lists in pandas is not good idea. And there is a much faster vectorized way:
- squeeze the list into single numpy cell.
- tile that cell and assign it to the DF.
df = pd.DataFrame(index=np.arange(1e6))
mylist= [10,20,30,40,50]
#ORIGINAL:
%%timeit -n 100
df['C'] = [mylist for i in df.index]
>>> 188 ms ± 922 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# VECTORIZED:
%%timeit -n 100
q = np.array([1,], dtype=object) # dummy array, note the dtype
q[0] = mylist # squeeze the list into single cell
df['C'] = np.tile(q, df.shape[0]) # tile and assign
>>> 12.1 ms ± 44.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The gain is especially high with larger DF sizes. (15x in this example) Hopefully there is a more elegant way to fit a list into single numpy cell.
Solution 6:[6]
That should work:
df = pd.DataFrame({'A':[1,2,3], 'B':[4,5,6]})
my_list = [10, 20, 30, 40]
df['C'] = [my_list] * df.shape[0]
df
A B C
0 1 4 [10, 20, 30, 40]
1 2 5 [10, 20, 30, 40]
2 3 6 [10, 20, 30, 40]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | |
Solution 4 | |
Solution 5 | Poe Dator |
Solution 6 | Vladimir Lukin |