'Convert string hours to minute pd.eval
I want to convert all rows of my DataFrame that contains hours and minutes into minutes only. I have a dataframe that looks like this:
df=
time
0 8h30
1 14h07
2 08h30
3 7h50
4 8h0
5 8h15
6 6h15
I'm using the following method to convert:
df['time'] = pd.eval(
df['time'].replace(['h'], ['*60+'], regex=True))
Output
SyntaxError: invalid syntax
I think the error comes from the format of the hour, maybe pd.eval
cant accept 08h30
or 8h0
, how to solve this probleme ?
Solution 1:[1]
To avoid having to trim leading zeros, an alternative approach:
df[['h', 'm']] = df['time'].str.split('h', expand=True).astype(int)
df['total_min'] = df['h']*60 + df['m']
Result:
time h m total_min
0 8h30 8 30 510
1 14h07 14 7 847
2 08h30 8 30 510
3 7h50 7 50 470
4 8h0 8 0 480
5 8h15 8 15 495
6 6h15 6 15 375
Solution 2:[2]
Pandas can already handle such strings if the units are included in the string. While 14h07
can't be parse (why assume 07
is minutes?), 14h07
can be converted to a Timedelta :
>>> pd.to_timedelta("14h07m")
Timedelta('0 days 14:07:00')
Given this dataframe :
d1 = pd.DataFrame(['8h30m', '14h07m', '08h30m', '8h0m'],
columns=['time'])
You can convert the time
series into a Timedelta series with pd.to_timedelta
:
>>> d1['tm'] = pd.to_timedelta(d1['time'])
>>> d1
time tm
0 8h30m 0 days 08:30:00
1 14h07m 0 days 14:07:00
2 08h30m 0 days 08:30:00
3 8h0m 0 days 08:00:00
To handle the missing minutes unit in the original data, just append m
:
d1['tm'] = pd.to_timedelta(d1['time'] + 'm')
Once you have a Timedelta
you can calculate hours and minutes.
The components of the values can be retrieved with Timedelta.components
>>> d1.tm.dt.components.hours
0 8
1 14
2 8
3 8
Name: hours, dtype: int64
To get the total minutes, seconds or hours, change the frequency to minutes:
>>> d1.tm.astype('timedelta64[m]')
0 510.0
1 847.0
2 510.0
3 480.0
Name: tm, dtype: float64
Bringing all the operations together :
>>> d1['tm'] = pd.to_timedelta(d1['time'])
>>> d2 = (d1.assign(h=d1.tm.dt.components.hours,
... m=d1.tm.dt.components.minutes,
... total_minutes=d1.tm.astype('timedelta64[m]')))
>>>
>>> d2
time tm h m total_minutes
0 8h30m 0 days 08:30:00 8 30 510.0
1 14h07m 0 days 14:07:00 14 7 847.0
2 08h30m 0 days 08:30:00 8 30 510.0
3 8h0m 0 days 08:00:00 8 0 480.0
Solution 3:[3]
Just to give an alternative approach with kind of the same elements as above you could do:
df = pd.DataFrame(data=["8h30", "14h07", "08h30", "7h50", "8h0 ", "8h15", "6h15"],
columns=["time"])
First split you column on the "h"
hm = df["time"].str.split("h", expand=True)
Then combine the columns again, but zeropad time hours and minutes in order to make valid time strings:
df2 = hm[0].str.strip().str.zfill(2) + hm[1].str.strip().str.zfill(2)
Then convert the string column with proper values to a date time column:
df3 = pd.to_datetime(df2, format="%H%M")
Finally, calculate the number of minutes by subtrackting a zero time (to make deltatimes) and divide by the minutes deltatime:
zerotime= pd.to_datetime("0000", format="%H%M")
df['minutes'] = (df3 - zerotime) / pd.Timedelta(minutes=1)
The results look like:
time minutes
0 8h30 510.0
1 14h07 847.0
2 08h30 510.0
3 7h50 470.0
4 8h0 480.0
5 8h15 495.0
6 6h15 375.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Peter Leimbigler |
Solution 2 | Panagiotis Kanavos |
Solution 3 | Eelco van Vliet |