'Fill missing date and time in Python (pandas)
I have a large data set, a sample is given below. The data is recorded for 1 day
with 5-min
interval for 24 hours
for 3214 unique ids
.
- The
time and date
information is missing in some rows as shown in the image. Is there a way to automatically fill the missing date and time for everyunique_id
? Every unique id has 288 datetime values (5 min * 12 hr * 24 hr =288
values per day with 5 min increment in time interval) - Also, how to fill the missing
unqiue ids
with previous value? - I am trying to fill the last column
VALUE
with empty cells.
I searched for the relevant answers 1, 2 but it did not work.
The most relent answer was here, but the solutions are only for missing dates not for both date and times.
sample_data.csv
datetime Unique_ID VALUE
28/01/2018 2:05 105714 284
28/01/2018 2:10 105714 292
28/01/2018 2:15 105714 287
28/01/2018 2:20 105714 193
28/01/2018 2:25 105714 2348
28/01/2018 2:30 105714 284
28/01/2018 2:35 105714 286
28/01/2018 2:40 105714 293
28/01/2018 2:45 105714 229
28/01/2018 2:50 105714 2292
28/01/2018 2:55 105714 225
28/01/2018 4:45 105714 185
28/01/2018 4:50 105714 277
28/01/2018 4:55 105714 185
28/01/2018 5:00 105714 2435
28/01/2018 5:05 105714 273
28/01/2018 5:10 105714 2360
28/01/2018 5:15 105714 282
Solution 1:[1]
Here in pandas
you can try:
Firstly get the index of nan
values:
idx=df[df.isna().all(1)].index
Then use ffill()
method:
df[['Unique_ID','datetime']]=df[['Unique_ID','datetime']].ffill()
Finally you can use groupby()
method and transform()
method:
df['VALUE']=df.groupby(['Unique_ID','datetime'],dropna=False).transform('last')
Now if you want to increment 'datetime' column value by 5 mins which are filled by ffill()
method then:
df.loc[idx,'datetime']=df.loc[idx,'datetime']+pd.to_timedelta(5,unit='m')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |