'How can I fill a column with values that are computed between two dates in pandas, with a delay of one row, respecting certain conditions?

I have the following DataFrame:

Date Distance Position TrainerID
2017-09-03 1000 2 6529
2017-09-03 1600 4 6529
2017-09-03 1200 3 6529
2017-09-06 1200 13 6529
2017-09-08 1000 1 6529
2017-09-10 1600 9 6529
2017-09-15 1600 2 6529

I want to compute on every row the winning percentage so far for the sprint races (distance of 1200 meters or less) in the last 1000 days, grouped by TrainerID. The result will be stored in a Win% Column. Dates need not to be unique. However, the winning % is considered to be before the race happened, so the current row is excluded. Thus, the results are delayed by one row.

The rows of the races that do not fit this category, should have the winning percentage from above.

What I am looking for is a result like this:

Date Distance Position TrainerID Win %
2017-09-03 1000 2 6529 0 (0 wins, 0 races)
2017-09-03 1600 4 6529 0 (does not fit the criteria, value copied from above)
2017-09-03 1200 3 6529 0 (0 wins, 1 race)
2017-09-06 1200 13 6529 0 (0 wins, 2 races)
2017-09-08 1000 1 6529 0 (0 wins, 3 races)
2017-09-10 1600 9 6529 25 (1 win, 4 races)
2017-09-15 1600 2 6529 25 (does not fit the criteria, value copied from above)

For this, I know I am going to need a mask to select the data that fits and, of course, the selecting code. What I have so far is this:

mask = (df.Distance == 1000) | (df.Distance == 1200)
df = (df.loc[mask].set_index('Date').groupby('TrainerID').rolling(no_days)['Position'].apply(lambda s:round(s.eq(1).sum()/len(s)*100)).groupby('TrainerID').shift().values)

The problem is that I get a dimensional mismatch error: Length of values (12521) does not match length of index (27008)

The selection is correct, I have problems with the mask. Can you help me?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source