'Python pandas nlargest() not working properly with keep = 'all'
When I try to use the function below
top3 = df1.nlargest(3, 'perChange', keep='all')
Even if keep = 'all', the output is
92 3.828120
255 -0.673854
256 -0.673854
Name: perChange, dtype: float64
However, the sorted array is
92 3.828120
255 -0.673854
256 -0.673854
304 -1.906793
340 -2.643661
355 -3.421462
359 -3.549768
What I want is
92 3.828120
255 -0.673854
256 -0.673854
304 -1.906793
How can I solve the problem?
Edit: For the people who interest to solve it, here is the example code which show the problem.
# initialize data of lists.
data = {'value': [3.828120, -0.673854, -0.673854, -1.906793, -2.643661]}
# Create DataFrame
test_df = pd.DataFrame(data)
test_df = test_df.nlargest(3, 'value', keep='all')
# Print the output.
print(test_df['value'])
Solution 1:[1]
It seems to be a bug for nlargest().
I write a function that does the job I want. I sorted the df and use the function below to find the index.
def findnlargestI(df,n,col):
lastValue = 0
lastRank = 0
for i in range(0,len(df.index)):
currentValue = df[col].iloc[i]
if currentValue != lastValue:
rank = lastRank + 1
lastRank = rank
lastValue = currentValue
if rank > n:
return i
top3= df1.iloc[0:findnlargestI(df1,3,'perChange'),:]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | kelvinchiyin |