'Select previous row every hour in pandas
I am trying to obtain the closest previous data point every hour in a pandas data frame. For example:
time value
0 14:59:58 15
1 15:00:10 20
2 15:57:42 14
3 16:00:30 9
would return
time value
0 15:00:00 15
1 16:00:00 14
i.e. rows 0 and 2 of the original data frame. How would I go about doing so? Thanks!
Solution 1:[1]
With the following toy dataframe:
import pandas as pd
df = pd.DataFrame(
{"time": ["14:59:58", "15:00:10", "15:57:42", "16:00:30"], "value": [15, 20, 14, 9]}
)
Here is one way to do it:
# Setup
df["time"] = pd.to_datetime(df["time"], format="%H:%M:%S")
temp_df = pd.DataFrame(df["time"].dt.round("H").drop_duplicates()).assign(value=pd.NA)
# Add round hours to df, find nearest data points and drop previous hours
new_df = (
pd.concat([df, temp_df])
.sort_values(by="time")
.fillna(method="ffill")
.pipe(lambda df_: df_[~df_["time"].isin(df["time"])])
.reset_index(drop=True)
)
# Cleanup
new_df["time"] = new_df["time"].dt.time
print(new_df)
# Output
time value
0 15:00:00 15
1 16:00:00 14
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Laurent |