'How to sample a python df on daily rate when it is greater than 500 yrs
I need to sample a dataframe that has a date range of 100 years at a daily rate because I want to get yearly totals (so I thought resample at daily rate then sum the yearly totals).
I tried
d0=start_date
# set date to model start date
d=d0
ind =Time_data2['datetime']
df_out=pd.DataFrame(index=range((max(ind)-d0).days),columns=
['datetime','year','value'])
for i in range((max(ind)-d0).days): # for every day in the total number of days in the simulation
d = d0 + datetime.timedelta(days=i) # get a particular day (= start_date + timedelta)
df_out.loc[i,'datetime']=d # assign datetime for each day
df_out.loc[i,'year']=d.year # assign year for each day
# Assign value based on the first value in the raw timeseries that proceeds the day being filled, this is equivilent to a backfill with the pandas resample
for t in model_flow_ts.index:
dt = t-d # calcualtes a timedelta between each index value in model_flow_ts and the particular day in the simulation
if dt.days < 0:
continue
else:
v = model_flow_ts.loc[t] # get the value
break
df_out.loc[i,'value']=v
if i/50000==int(i/50000):
print(i)
But it takes a really long time because there are so many days to sample...
Does anyone have any suggestions on how to speed it up?
cheers
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|