'how to convert Iso duration to minutes in pyspark or python
I have a column in the python data frame which has values like below
I am looking to convert the ISO format in minutes The output should be :
- 15 minutes
- 90 minutes
- 5 minutes
import pandas as pd
import re
import json
from datetime import datetime
currentdate=datetime.today().strftime('%Y/%m/%d')
absolutepath='/project/sniper/'+'/'+currentdate+'/*.json'
df = pd.read_json('absolutepath', lines=True)
df_sugar = df.loc[df['ingredients'].str.contains("Sugar|sugar", case=True)]
def convertToInteger(my_str):
if 'H' in my_str and PT in my_str:
characters_to_remove_H = "H"
for l in characters_to_remove_H:
new_string_hour = my_str.replace(l, "*60")
new_p=int(new_string_hour.replace(PT,""))
return pd.Series(new_p)
elif 'M' in my_str and PT in my_str:
characters_to_remove_M = "PTM"
for m in characters_to_remove_M:
new_string_minute = int(my_str.replace(m, ""))
return pd.Series(new_string_minute)
df2[["new_col_2"]] = df_beef["prepTime"].apply(convertToInteger)
Solution 1:[1]
Assuming your data is something like this (you probably have more columns, but you get the point):
df = pd.DataFrame(['PT15M', 'PT1H30M', 'PT5M'], columns=['prepTime'])
I'd use isodate
package to have a more robust approach to the problem
def get_minutes(iso_str):
iso_timedelta = isodate.parse_duration(iso_str)
return iso_timedelta.seconds // 60
df['prepTimeMinutes'] = df['prepTime'].apply(get_minutes)
Or oneliner:
df['prepTimeMinutes'] = df['prepTime'].apply(lambda x: isodate.parse_duration(x).seconds // 60)
If you don't want to use isodate
, you could apply a custom approach. According to your requirements, you may generalize it, but if all of your string are in the format "PT[<hours>H]<minutes>M"
you could simply do something like:
import re
def get_minutes(iso_str):
hours = re.search(r"(\d+)H", iso_str)
hours = hours.group(1) if hours else 0
minutes = re.search(r"(\d+)M", iso_str)
minutes = minutes.group(1) if minutes else 0
return int(hours) * 60 + int(minutes)
df['prepTimeMinutes'] = df['prepTime'].apply(get_minutes)
To generalize it, I'd anyway suggest you to take a look at isodate
source.
There are many other ways to do the same thing, I hope this gives you some hints on how to proceed :)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | ALai |