'how to convert Iso duration to minutes in pyspark or python

I have a column in the python data frame which has values like below

enter image description here

I am looking to convert the ISO format in minutes The output should be :

  • 15 minutes
  • 90 minutes
  • 5 minutes
import pandas as pd
import re
import json
from datetime import datetime
currentdate=datetime.today().strftime('%Y/%m/%d')
absolutepath='/project/sniper/'+'/'+currentdate+'/*.json'

df = pd.read_json('absolutepath', lines=True)
df_sugar = df.loc[df['ingredients'].str.contains("Sugar|sugar", case=True)]
def convertToInteger(my_str):
    if 'H' in my_str and PT in my_str:
      characters_to_remove_H = "H"
      for l in characters_to_remove_H:
           new_string_hour = my_str.replace(l, "*60")
           new_p=int(new_string_hour.replace(PT,""))
      return  pd.Series(new_p)
   
    elif  'M' in my_str and PT in my_str:
        characters_to_remove_M = "PTM"
        for m in characters_to_remove_M:
            new_string_minute = int(my_str.replace(m, ""))
        return  pd.Series(new_string_minute)

df2[["new_col_2"]] = df_beef["prepTime"].apply(convertToInteger)


Solution 1:[1]

Assuming your data is something like this (you probably have more columns, but you get the point):

df = pd.DataFrame(['PT15M', 'PT1H30M', 'PT5M'], columns=['prepTime'])

I'd use isodate package to have a more robust approach to the problem

def get_minutes(iso_str):
    iso_timedelta = isodate.parse_duration(iso_str)
    return iso_timedelta.seconds // 60
    
df['prepTimeMinutes'] = df['prepTime'].apply(get_minutes)

Or oneliner:

df['prepTimeMinutes'] = df['prepTime'].apply(lambda x: isodate.parse_duration(x).seconds // 60)

If you don't want to use isodate, you could apply a custom approach. According to your requirements, you may generalize it, but if all of your string are in the format "PT[<hours>H]<minutes>M" you could simply do something like:

import re

def get_minutes(iso_str):
    hours = re.search(r"(\d+)H", iso_str)
    hours = hours.group(1) if hours else 0
    minutes = re.search(r"(\d+)M", iso_str)
    minutes = minutes.group(1) if minutes else 0
    
    return int(hours) * 60 + int(minutes)

df['prepTimeMinutes'] = df['prepTime'].apply(get_minutes)

To generalize it, I'd anyway suggest you to take a look at isodate source.

There are many other ways to do the same thing, I hope this gives you some hints on how to proceed :)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ALai