'Convert string hours to minute pd.eval

I want to convert all rows of my DataFrame that contains hours and minutes into minutes only. I have a dataframe that looks like this:

df=
    time
0    8h30
1    14h07
2    08h30
3    7h50
4    8h0 
5    8h15
6    6h15

I'm using the following method to convert:

df['time'] = pd.eval(
    df['time'].replace(['h'], ['*60+'], regex=True))

Output

SyntaxError: invalid syntax

I think the error comes from the format of the hour, maybe pd.evalcant accept 08h30 or 8h0, how to solve this probleme ?



Solution 1:[1]

To avoid having to trim leading zeros, an alternative approach:

df[['h', 'm']] = df['time'].str.split('h', expand=True).astype(int)
df['total_min'] = df['h']*60 + df['m']

Result:

    time   h   m      total_min
0   8h30   8  30            510
1  14h07  14   7            847
2  08h30   8  30            510
3   7h50   7  50            470
4    8h0   8   0            480
5   8h15   8  15            495
6   6h15   6  15            375

Solution 2:[2]

Pandas can already handle such strings if the units are included in the string. While 14h07 can't be parse (why assume 07 is minutes?), 14h07 can be converted to a Timedelta :

>>> pd.to_timedelta("14h07m")
Timedelta('0 days 14:07:00')

Given this dataframe :

d1 = pd.DataFrame(['8h30m', '14h07m', '08h30m', '8h0m'],
                  columns=['time'])

You can convert the time series into a Timedelta series with pd.to_timedelta :

>>> d1['tm'] = pd.to_timedelta(d1['time'])
>>> d1
     time              tm
0   8h30m 0 days 08:30:00
1  14h07m 0 days 14:07:00
2  08h30m 0 days 08:30:00
3    8h0m 0 days 08:00:00

To handle the missing minutes unit in the original data, just append m:

d1['tm'] = pd.to_timedelta(d1['time'] + 'm')

Once you have a Timedelta you can calculate hours and minutes.

The components of the values can be retrieved with Timedelta.components

>>> d1.tm.dt.components.hours
0     8
1    14
2     8
3     8
Name: hours, dtype: int64

To get the total minutes, seconds or hours, change the frequency to minutes:

>>> d1.tm.astype('timedelta64[m]')
0    510.0
1    847.0
2    510.0
3    480.0
Name: tm, dtype: float64

Bringing all the operations together :

>>> d1['tm'] = pd.to_timedelta(d1['time'])
>>> d2 = (d1.assign(h=d1.tm.dt.components.hours,
...                 m=d1.tm.dt.components.minutes,
...                 total_minutes=d1.tm.astype('timedelta64[m]')))
>>>
>>> d2
     time              tm   h   m  total_minutes
0   8h30m 0 days 08:30:00   8  30          510.0
1  14h07m 0 days 14:07:00  14   7          847.0
2  08h30m 0 days 08:30:00   8  30          510.0
3    8h0m 0 days 08:00:00   8   0          480.0

Solution 3:[3]

Just to give an alternative approach with kind of the same elements as above you could do:

df = pd.DataFrame(data=["8h30", "14h07", "08h30", "7h50", "8h0 ", "8h15", "6h15"],
                  columns=["time"])

First split you column on the "h"

hm = df["time"].str.split("h", expand=True)

Then combine the columns again, but zeropad time hours and minutes in order to make valid time strings:

df2 = hm[0].str.strip().str.zfill(2) + hm[1].str.strip().str.zfill(2)

Then convert the string column with proper values to a date time column:

df3 = pd.to_datetime(df2, format="%H%M")

Finally, calculate the number of minutes by subtrackting a zero time (to make deltatimes) and divide by the minutes deltatime:

zerotime= pd.to_datetime("0000", format="%H%M")
df['minutes'] = (df3 - zerotime) / pd.Timedelta(minutes=1)

The results look like:

    time  minutes
0   8h30    510.0
1  14h07    847.0
2  08h30    510.0
3   7h50    470.0
4   8h0     480.0
5   8h15    495.0
6   6h15    375.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Peter Leimbigler
Solution 2 Panagiotis Kanavos
Solution 3 Eelco van Vliet