'Python pandas df.copy() ist not deep

I have (in my opinion) a strange problem with python pandas. If I do:

cc1 = cc.copy(deep=True)

for the dataframe cc and than ask a certain row and column:

print(cc1.loc['myindex']['data'] is cc.loc['myindex']['data'])

I get

True

What's wrong here?



Solution 1:[1]

Deep copying doesn't work in pandas and the devs consider putting mutable objects inside a DataFrame as an antipattern

There is nothing wrong in your code, just in case if you want to know the difference with some example of deep and shallow copy() here it is.

Deep copy

dict_1= {'Column A': ['House','Animal', 'car'],
     'Column B': ["walls,doors,rooms", "Legs,nose,eyes", "tires,engine" ]}

df1 = pd.DataFrame(dict_1, columns=['Column A', 'Column B'])

# Deep copy
df2 = df1.copy()  #  deep=True by default
df2 == df1  # it returns True because no updates has happened on either of dfs
output
#   Column A    Column B
# 0 True    True
# 1 True    True
# 2 True    True

id(df1)  # output: 2302063108040
id(df2)  # ouptut: 2302063137224

Now if you update Column B of df1

dict_new =  {'Column A': ['House','Animal', 'car'],
     'Column B': ["walls", "Legs,nose,eyes,tail", "tires,engine,bonnet" ]}

# updating only column B values
df1.update(dict_new)

df1 == df2   # it returns false for the values which got changed

output:

    Column A    Column B
0   True    False
1   True    False
2   True    False

And if we see df1 # which is deeply copied it remains unchanged

df1
# output:
# Column A  Column B
# 0 House   walls,doors,rooms
# 1 Animal  Legs,nose,eyes
# 2 car tires,engine

Shallow copy

df2 = df1.copy(deep=False)  #  deep=True by default hence explicitly providing argument to False
df2 == df1  # it returns True because no updates has happened on either of dfs
# output
#   Column A    Column B
# 0 True    True
# 1 True    True
# 2 True    True

dict_new =  {'Column A': ['House','Animal', 'car'],
     'Column B': ["walls", "Legs,nose,eyes,tail", "tires,engine,bonnet" ]}

df1.update(dict_new)

df2 == df1  # since it has same reference of d1 you will see all true even after updating column B unlike deep copy
# output
#   Column A    Column B
# 0 True    True
# 1 True    True
# 2 True    True

df2  # now if you see df2 it has all those updated values of df1

# output:
#   Column A    Column B
# 0 House   walls
# 1 Animal  Legs,nose,eyes,tail
# 2 car tires,engine,bonnet

Source: python Pandas DataFrame copy(deep=False) vs copy(deep=True) vs '=' https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.copy.html

Solution 2:[2]

To create a deepcopy of a dictionary, the following code should work:

import copy
dict1 = {'a': 5, 'b': 6, 'c': 7}
dict2 = copy.deepcopy(dict1)
for i in dict1:
    dict1[i] += 5
print(dict1)
print(dict2)

{'a': 10, 'b': 11, 'c': 12}
{'a': 5, 'b': 6, 'c': 7}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2