'Pandas Group by index Hour and keeping observation for each hour
I have a pandas dataframe containing one column and a datetime index, i need to group the data by hour and keep each obsevation (record) for each of the grouped by hours, the following methods allow groupping the data:
data= data.groupby(data.index.hour).max()
The issue is that the previous method allows a groupping using an aggregator (max, mean or min), while i need to keep each of the observations.
Solution 1:[1]
If you group by the hour, then you'll just get 24 unique records, one for each hour in the day and not across the days within your data.
If your intention is to aggregate multiple observations that occur each hour and to view them over time, you'll want to use the pandas.Grouper class.
In either case, you can apply list
to keep all of the observations within each group individually, instead of aggregating.
Note: You'll need to replace "data_col" with the name of the data column in your DF that contains the observations
Grouping to hour number
data = data.groupby(data.index.hour)["data_col"].apply(list)
Grouping to hourly time sequence
time_grouper = pd.Grouper(freq="1h")
data = data.groupby(time_grouper)["data_col"].apply(list)
Solution 2:[2]
Alternative to GroupBy
You can also use Panda's resample
function to up or downsample to a desired frequency if your index is datetime
format.
# set 'date' column as index assuming it isn't already. Make sure it is in datetime format
data= data.set_index('date')
# resample your data to hourly frequency by taking the sum of all values within each hour.
data.resample('H').sum()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Chris Farr |
Solution 2 | Gene Burinsky |