'How can I fill a column with values that are computed between two dates in pandas, with a delay of one row, respecting certain conditions?
I have the following DataFrame:
Date | Distance | Position | TrainerID |
---|---|---|---|
2017-09-03 | 1000 | 2 | 6529 |
2017-09-03 | 1600 | 4 | 6529 |
2017-09-03 | 1200 | 3 | 6529 |
2017-09-06 | 1200 | 13 | 6529 |
2017-09-08 | 1000 | 1 | 6529 |
2017-09-10 | 1600 | 9 | 6529 |
2017-09-15 | 1600 | 2 | 6529 |
I want to compute on every row the winning percentage so far for the sprint races (distance of 1200 meters or less) in the last 1000 days, grouped by TrainerID
. The result will be stored in a Win%
Column. Dates need not to be unique. However, the winning % is considered to be before the race happened, so the current row is excluded. Thus, the results are delayed by one row.
The rows of the races that do not fit this category, should have the winning percentage from above.
What I am looking for is a result like this:
Date | Distance | Position | TrainerID | Win % |
---|---|---|---|---|
2017-09-03 | 1000 | 2 | 6529 | 0 (0 wins, 0 races) |
2017-09-03 | 1600 | 4 | 6529 | 0 (does not fit the criteria, value copied from above) |
2017-09-03 | 1200 | 3 | 6529 | 0 (0 wins, 1 race) |
2017-09-06 | 1200 | 13 | 6529 | 0 (0 wins, 2 races) |
2017-09-08 | 1000 | 1 | 6529 | 0 (0 wins, 3 races) |
2017-09-10 | 1600 | 9 | 6529 | 25 (1 win, 4 races) |
2017-09-15 | 1600 | 2 | 6529 | 25 (does not fit the criteria, value copied from above) |
For this, I know I am going to need a mask to select the data that fits and, of course, the selecting code. What I have so far is this:
mask = (df.Distance == 1000) | (df.Distance == 1200)
df = (df.loc[mask].set_index('Date').groupby('TrainerID').rolling(no_days)['Position'].apply(lambda s:round(s.eq(1).sum()/len(s)*100)).groupby('TrainerID').shift().values)
The problem is that I get a dimensional mismatch error: Length of values (12521) does not match length of index (27008)
The selection is correct, I have problems with the mask. Can you help me?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|