'Python pandas df.copy() ist not deep
I have (in my opinion) a strange problem with python pandas. If I do:
cc1 = cc.copy(deep=True)
for the dataframe cc and than ask a certain row and column:
print(cc1.loc['myindex']['data'] is cc.loc['myindex']['data'])
I get
True
What's wrong here?
Solution 1:[1]
Deep copying doesn't work in pandas and the devs consider putting mutable objects inside a DataFrame as an antipattern
There is nothing wrong in your code, just in case if you want to know the difference with some example of deep and shallow copy() here it is.
Deep copy
dict_1= {'Column A': ['House','Animal', 'car'],
'Column B': ["walls,doors,rooms", "Legs,nose,eyes", "tires,engine" ]}
df1 = pd.DataFrame(dict_1, columns=['Column A', 'Column B'])
# Deep copy
df2 = df1.copy() # deep=True by default
df2 == df1 # it returns True because no updates has happened on either of dfs
output
# Column A Column B
# 0 True True
# 1 True True
# 2 True True
id(df1) # output: 2302063108040
id(df2) # ouptut: 2302063137224
Now if you update Column B of df1
dict_new = {'Column A': ['House','Animal', 'car'],
'Column B': ["walls", "Legs,nose,eyes,tail", "tires,engine,bonnet" ]}
# updating only column B values
df1.update(dict_new)
df1 == df2 # it returns false for the values which got changed
output:
Column A Column B
0 True False
1 True False
2 True False
And if we see df1 # which is deeply copied it remains unchanged
df1
# output:
# Column A Column B
# 0 House walls,doors,rooms
# 1 Animal Legs,nose,eyes
# 2 car tires,engine
Shallow copy
df2 = df1.copy(deep=False) # deep=True by default hence explicitly providing argument to False
df2 == df1 # it returns True because no updates has happened on either of dfs
# output
# Column A Column B
# 0 True True
# 1 True True
# 2 True True
dict_new = {'Column A': ['House','Animal', 'car'],
'Column B': ["walls", "Legs,nose,eyes,tail", "tires,engine,bonnet" ]}
df1.update(dict_new)
df2 == df1 # since it has same reference of d1 you will see all true even after updating column B unlike deep copy
# output
# Column A Column B
# 0 True True
# 1 True True
# 2 True True
df2 # now if you see df2 it has all those updated values of df1
# output:
# Column A Column B
# 0 House walls
# 1 Animal Legs,nose,eyes,tail
# 2 car tires,engine,bonnet
Source: python Pandas DataFrame copy(deep=False) vs copy(deep=True) vs '=' https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.copy.html
Solution 2:[2]
To create a deepcopy of a dictionary, the following code should work:
import copy
dict1 = {'a': 5, 'b': 6, 'c': 7}
dict2 = copy.deepcopy(dict1)
for i in dict1:
dict1[i] += 5
print(dict1)
print(dict2)
{'a': 10, 'b': 11, 'c': 12}
{'a': 5, 'b': 6, 'c': 7}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 |