'Add months to a date in Pandas

I'm trying to figure out how to add 3 months to a date in a Pandas dataframe, while keeping it in the date format, so I can use it to lookup a range.

This is what I've tried:

#create dataframe
df = pd.DataFrame([pd.Timestamp('20161011'),
                   pd.Timestamp('20161101') ], columns=['date'])

#create a future month period
plus_month_period = 3

#calculate date + future period
df['future_date'] = plus_month_period.astype("timedelta64[M]")

However, I get the following error:

AttributeError: 'int' object has no attribute 'astype'


Solution 1:[1]

You could use pd.DateOffset

In [1756]: df.date + pd.DateOffset(months=plus_month_period)
Out[1756]:
0   2017-01-11
1   2017-02-01
Name: date, dtype: datetime64[ns]

Another way using pd.offsets.MonthOffset

In [1785]: df.date + pd.offsets.MonthOffset(plus_month_period)
Out[1785]:
0   2016-10-14
1   2016-11-04
Name: date, dtype: datetime64[ns]

Details

In [1757]: df
Out[1757]:
        date
0 2016-10-11
1 2016-11-01

In [1758]: plus_month_period
Out[1758]: 3

Solution 2:[2]

Suppose you have a dataframe of the following format, where you have to add integer months to a date column.

Start_Date Months_to_add
2014-06-01 23
2014-06-01 4
2000-10-01 10
2016-07-01 3
2017-12-01 90
2019-01-01 2

In such a scenario, using Zero's code or mattblack's code won't be useful. You have to use lambda function over the rows where the function takes 2 arguments -

  1. A date to which months need to be added to
  2. A month value in integer format

You can use the following function:

# Importing required modules
from dateutil.relativedelta import relativedelta

# Defining the function
def add_months(start_date, delta_period):
  end_date = start_date + relativedelta(months=delta_period)
  return end_date

After this you can use the following code snippet to add months to the Start_Date column. Use progress_apply functionality of Pandas. Refer to this Stackoverflow answer on progress_apply : Progress indicator during pandas operations.

from tqdm import tqdm
tqdm.pandas()

df["End_Date"] = df.progress_apply(lambda row: add_months(row["Start_Date"], row["Months_to_add"]), axis = 1)

Here's the full code form dataset creation, for your reference:

import pandas as pd
from dateutil.relativedelta import relativedelta
from tqdm import tqdm
tqdm.pandas()

# Initilize a new dataframe
df = pd.DataFrame()

# Add Start Date column
df["Start_Date"] = ['2014-06-01T00:00:00.000000000',
                    '2014-06-01T00:00:00.000000000',
                    '2000-10-01T00:00:00.000000000',
                    '2016-07-01T00:00:00.000000000',
                    '2017-12-01T00:00:00.000000000',
                    '2019-01-01T00:00:00.000000000']

# To convert the date column to a datetime format
df["Start_Date"] = pd.to_datetime(df["Start_Date"])
# Add months column
df["Months_to_add"] = [23, 4, 10, 3, 90, 2]

# Defining the Add Months function
def add_months(start_date, delta_period):
  end_date = start_date + relativedelta(months=delta_period)
  return end_date

# Apply function on the dataframe using lambda operation.
df["End_Date"] = df.progress_apply(lambda row: add_months(row["Start_Date"], row["Months_to_add"]), axis = 1)

You will have the final output dataframe as follows.

Start_Date Months_to_add End_Date
2014-06-01 23 2016-05-01
2014-06-01 4 2014-10-01
2000-10-01 10 2001-08-01
2016-07-01 3 2016-10-01
2017-12-01 90 2025-06-01
2019-01-01 2 2019-03-01

Please add to comments if there are any issues with the above code.
All the best!

Solution 3:[3]

I believe that the simplest and most efficient (faster) way to solve this is to transform the date to monthly periods with to_period(M), add the result with the values of the Months_to_add column and then retrieve the data as datetime with the .dt.to_timestamp() command.

Using the sample data created by @Aruparna Maity

Start_Date Months_to_add
2014-06-01 23
2014-06-20 4
2000-10-01 10
2016-07-05 3
2017-12-15 90
2019-01-01 2
df['End_Date'] = ((df['Start_Date'].dt.to_period('M')) + df['Months_to_add']).dt.to_timestamp()
df.head(6)
#output
    Start_Date  Months_to_add   End_Date
0   2014-06-01  23              2016-05-01
1   2014-06-20  4               2014-10-01
2   2000-10-01  10              2001-08-01
3   2016-07-05  3               2016-10-01
4   2017-12-15  90              2025-06-01
5   2019-01-01  2               2019-03-01

If the exact day is needed, just repeat the process, but changing the periods to days

df['End_Date'] = ((df['End_Date'].dt.to_period('D')) + df['Start_Date'].dt.day -1).dt.to_timestamp()
#output:
    Start_Date  Months_to_add   End_Date
0   2014-06-01  23              2016-05-01
1   2014-06-20  4               2014-10-20
2   2000-10-01  10              2001-08-01
3   2016-07-05  3               2016-10-05
4   2017-12-15  90              2025-06-15
5   2019-01-01  2               2019-03-01

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Aruparna Maity
Solution 3