'Select previous row every hour in pandas

I am trying to obtain the closest previous data point every hour in a pandas data frame. For example:

        time  value
0   14:59:58     15
1   15:00:10     20
2   15:57:42     14
3   16:00:30      9

would return

        time  value
0   15:00:00     15
1   16:00:00     14

i.e. rows 0 and 2 of the original data frame. How would I go about doing so? Thanks!



Solution 1:[1]

With the following toy dataframe:

import pandas as pd

df = pd.DataFrame(
    {"time": ["14:59:58", "15:00:10", "15:57:42", "16:00:30"], "value": [15, 20, 14, 9]}
)

Here is one way to do it:

# Setup
df["time"] = pd.to_datetime(df["time"], format="%H:%M:%S")
temp_df = pd.DataFrame(df["time"].dt.round("H").drop_duplicates()).assign(value=pd.NA)

# Add round hours to df, find nearest data points and drop previous hours
new_df = (
    pd.concat([df, temp_df])
    .sort_values(by="time")
    .fillna(method="ffill")
    .pipe(lambda df_: df_[~df_["time"].isin(df["time"])])
    .reset_index(drop=True)
)

# Cleanup
new_df["time"] = new_df["time"].dt.time
print(new_df)
# Output
       time  value
0  15:00:00     15
1  16:00:00     14

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Laurent