'Python - Hours between two dates, excluding weekends

I'm doing my first steps in python programing language. I want to create a script that aims to open an excel file and add an extra column that will be the hourly difference between the two dates and save it to another file. To count hours I'm using businessDuration from https://pypi.org/project/business-duration/

My DF looks like that:

Id | Date1               | Date2               | Date3
1    01/01/2022 08:10:51 | 02/01/2022 08:15:51 | 15/01/2022 23:15:51
2    01/01/2022 08:15:51 (and so on) for every column with dates
3
4

My code look's like that

    import pandas as pd
    import os
    import warnings
    from datetime import time, datetime
    from business_duration import businessDuration
    from itertools import repeat
    
    def WeekDays_hour(start, end, un):
        start_date = start
        end_date = end
        unit = un
        return list(map(businessDuration, start_date, end_date, repeat(unit)))

os.chdir("C:\\Users\\M\\PycharmProjects\\Python\\source")
car_path = os.getcwd()
for filename in os.listdir(car_path):
    if filename.endswith(".xlsx"):
        with warnings.catch_warnings(record=True):
            warnings.simplefilter("always")
            df = pd.read_excel(car_path+'\\'+filename, engine="openpyxl")
        if df['Date1'] is not None:
                df['Time difference'] = WeekDays_hour(df['Date1'], df['Date2'], 'hour')
df.to_excel("C:\\Users\\M\\PycharmProjects\\Python\\rep\\"+filename, index=False)

I want to be able to call function "WeekDays_hour" for every pair of dates I need and as of result of this function I want to have another column in df.

I'm not entirely sure what I'm doing wrong, program is finishing without errors, but added column is empty. No calculations done. If any of you have any ideas I will be grateful.

EDIT: I tried @BeRT2me code.

When i put it like that:

from business_duration import businessDuration
import pandas as pd

df = pd.read_excel(r'C:\Users\M\PycharmProjects\CAR\test.xlsx')

def weekdays_hour(x: pd.Series, start_date: str, end_date: str, un: str) -> int:
    if x[start_date] and x[end_date]:
        return businessDuration(x[start_date], x[end_date], un)

df['difference'] = df.apply(weekdays_hour, args=('Date1', 'Date2', 'hour'))

df.to_excel(r"C:\Users\M\PycharmProjects\redy\Closed.xlsx", index=False)

When I run it I received error "KeyError: 'Date1'"

when I changed

df['difference'] = df.apply(weekdays_hour, axis=1, args=('Date1', 'Date2', 'hour'))

code run without errors, column was added but it was empty.

ANSWER by BeRT2me:

df['new'] = df.apply(lambda x: func(x['input'], x['input2']), axis=1)

and it works like a charm.

EDIT some time later:

df.applay was a little slow for me and I managed to find a quicker method:

def workdays(a, b, unit):
    return pd.Series([businessDuration(Date1, Date2, unit=unit) for (Date1, Date2) in zip(df[a], df[b])])

a, b = column names (like 'Date1') unit = 'hour', 'day'



Solution 1:[1]

Dug deeper into the code of businessDuration and found that the following is already explicitly defined, so neither if df['Date1'] is not None: nor if x[start_date] and x[end_date]: are needed.

if pd.isnull(startdate) or pd.isnull(enddate) or startdate>enddate:
        return np.nan

It also appears that unit must be a named input, or it returns np.nan.

Also, keep in mind that in your test data, Data1 > Date2, so it'll return np.nan. If those are given as startdate, enddate respectively.

Given this, the following should work just fine:

df['difference'] = df.apply(lambda x: businessDuration(x['Date2'], x['Date1'], unit='hour'), axis=1)

My Output:

                Date1               Date2               Date3   difference
0 2022-04-13 08:10:51 2022-02-01 08:15:51 2022-01-15 23:15:51  1223.916667

No need for any of my custom code, unless you want to rewrite how buisinessDuration takes inputs~

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1