'Python - Hours between two dates, excluding weekends
I'm doing my first steps in python programing language. I want to create a script that aims to open an excel file and add an extra column that will be the hourly difference between the two dates and save it to another file. To count hours I'm using businessDuration from https://pypi.org/project/business-duration/
My DF looks like that:
Id | Date1 | Date2 | Date3
1 01/01/2022 08:10:51 | 02/01/2022 08:15:51 | 15/01/2022 23:15:51
2 01/01/2022 08:15:51 (and so on) for every column with dates
3
4
My code look's like that
import pandas as pd
import os
import warnings
from datetime import time, datetime
from business_duration import businessDuration
from itertools import repeat
def WeekDays_hour(start, end, un):
start_date = start
end_date = end
unit = un
return list(map(businessDuration, start_date, end_date, repeat(unit)))
os.chdir("C:\\Users\\M\\PycharmProjects\\Python\\source")
car_path = os.getcwd()
for filename in os.listdir(car_path):
if filename.endswith(".xlsx"):
with warnings.catch_warnings(record=True):
warnings.simplefilter("always")
df = pd.read_excel(car_path+'\\'+filename, engine="openpyxl")
if df['Date1'] is not None:
df['Time difference'] = WeekDays_hour(df['Date1'], df['Date2'], 'hour')
df.to_excel("C:\\Users\\M\\PycharmProjects\\Python\\rep\\"+filename, index=False)
I want to be able to call function "WeekDays_hour" for every pair of dates I need and as of result of this function I want to have another column in df.
I'm not entirely sure what I'm doing wrong, program is finishing without errors, but added column is empty. No calculations done. If any of you have any ideas I will be grateful.
EDIT: I tried @BeRT2me code.
When i put it like that:
from business_duration import businessDuration
import pandas as pd
df = pd.read_excel(r'C:\Users\M\PycharmProjects\CAR\test.xlsx')
def weekdays_hour(x: pd.Series, start_date: str, end_date: str, un: str) -> int:
if x[start_date] and x[end_date]:
return businessDuration(x[start_date], x[end_date], un)
df['difference'] = df.apply(weekdays_hour, args=('Date1', 'Date2', 'hour'))
df.to_excel(r"C:\Users\M\PycharmProjects\redy\Closed.xlsx", index=False)
When I run it I received error "KeyError: 'Date1'"
when I changed
df['difference'] = df.apply(weekdays_hour, axis=1, args=('Date1', 'Date2', 'hour'))
code run without errors, column was added but it was empty.
ANSWER by BeRT2me:
df['new'] = df.apply(lambda x: func(x['input'], x['input2']), axis=1)
and it works like a charm.
EDIT some time later:
df.applay was a little slow for me and I managed to find a quicker method:
def workdays(a, b, unit):
return pd.Series([businessDuration(Date1, Date2, unit=unit) for (Date1, Date2) in zip(df[a], df[b])])
a, b = column names (like 'Date1') unit = 'hour', 'day'
Solution 1:[1]
Dug deeper into the code of businessDuration and found that the following is already explicitly defined, so neither if df['Date1'] is not None:
nor if x[start_date] and x[end_date]:
are needed.
if pd.isnull(startdate) or pd.isnull(enddate) or startdate>enddate:
return np.nan
It also appears that unit
must be a named input, or it returns np.nan
.
Also, keep in mind that in your test data, Data1 > Date2, so it'll return np.nan
. If those are given as startdate, enddate respectively.
Given this, the following should work just fine:
df['difference'] = df.apply(lambda x: businessDuration(x['Date2'], x['Date1'], unit='hour'), axis=1)
My Output:
Date1 Date2 Date3 difference
0 2022-04-13 08:10:51 2022-02-01 08:15:51 2022-01-15 23:15:51 1223.916667
No need for any of my custom code, unless you want to rewrite how buisinessDuration
takes inputs~
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |