'Looping over pandas DataFrame
I have a weird issue that the result doesn't change for each iteration. The code is the following:
import pandas as pd
import numpy as np
X = np.arange(10,100)
Y = X[::-1]
Z = np.array([X,Y]).T
df = pd.DataFrame(Z ,columns = ['col1','col2'])
dif = df['col1'] - df['col2']
for gap in range(100):
Up = dif > gap
Down = dif < -gap
df.loc[Up,'predict'] = 'Up'
df.loc[Down,'predict'] = 'Down'
df_result = df.dropna()
Total = df.shape[0]
count = df_result.shape[0]
ratio = count/Total
print(f'Total: {Total}; count: {count}; ratio: {ratio}')
The result is always
Total: 90; count: 90; ratio: 1.0
when it shouldn't be.
Solution 1:[1]
Found the root of the problem 5 mins after posting this question. I just needed to reset the dataFrame to the original to fix the problem.
import pandas as pd
import numpy as np
X = np.arange(10,100)
Y = X[::-1]
Z = np.array([X,Y]).T
df = pd.DataFrame(Z ,columns = ['col1','col2'])
df2 = df.copy()#added this line to preserve the original df
dif = df['col1'] - df['col2']
for gap in range(100):
df = df2.copy()#reset the altered df back to the original
Up = dif > gap
Down = dif < -gap
df.loc[Up,'predict'] = 'Up'
df.loc[Down,'predict'] = 'Down'
df_result = df.dropna()
Total = df.shape[0]
count = df_result.shape[0]
ratio = count/Total
print(f'Total: {Total}; count: {count}; ratio: {ratio}')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | mathguy |