'Pandas: how can I generate "year-month" format column (period)?

In [20]: df.head()
Out[20]:
   year  month     capital       sales      income      profit         debt 
0  2000      6 -19250379.0  37924704.0  -4348337.0   2571738.0  192842551.0
1  2000     12 -68357153.0  27870187.0 -49074146.0 -20764204.0  190848380.0
2  2001      6 -65048960.0  30529435.0  -1172803.0   2000427.0  197383572.0
3  2001     12 -90129943.0  17135480.0 -24208501.0  -1012230.0  191464941.0
4  2002      6  14671980.0  31377347.0   2188125.0   3660938.0  101355088.0

What I tried was:

df['date'] = pd.to_datetime(df['year']*10000 + df['month']*100, format="%Y%m")

but it occurred an error:

In [21]: df['date'] = pd.to_datetime(df['year']*10000 + df['month']*100, format="%Y%m"
    ...: )
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-31bfca8c5941> in <module>()
----> 1 df['date'] = pd.to_datetime(df['year']*10000 + df['month']*100, format="%Y%m")

~/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin)
    507     elif isinstance(arg, ABCSeries):
    508         from pandas import Series
--> 509         values = _convert_listlike(arg._values, False, format)
    510         result = Series(values, index=arg.index, name=arg.name)
    511     elif isinstance(arg, (ABCDataFrame, MutableMapping)):

~/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in _convert_listlike(arg, box, format, name, tz)
    412                     try:
    413                         result = tslib.array_strptime(arg, format, exact=exact,
--> 414                                                       errors=errors)
    415                     except tslib.OutOfBoundsDatetime:
    416                         if errors == 'raise':

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_strptime (pandas/_libs/tslib.c:63753)()

TypeError: 'int' object is unsliceable

I think this is because 'day' is missed.

How can I do this?



Solution 1:[1]

In [223]: df['date'] = pd.to_datetime(df[['year','month']].assign(day=1)).dt.to_period('M')

In [224]: df
Out[224]:
   year  month     capital       sales      income      profit         debt    date
0  2000      6 -19250379.0  37924704.0  -4348337.0   2571738.0  192842551.0 2000-06
1  2000     12 -68357153.0  27870187.0 -49074146.0 -20764204.0  190848380.0 2000-12
2  2001      6 -65048960.0  30529435.0  -1172803.0   2000427.0  197383572.0 2001-06
3  2001     12 -90129943.0  17135480.0 -24208501.0  -1012230.0  191464941.0 2001-12
4  2002      6  14671980.0  31377347.0   2188125.0   3660938.0  101355088.0 2002-06

or

In [208]: df['date'] = pd.PeriodIndex(pd.to_datetime(df[['year','month']].assign(day=1)),
                                      freq='M')

In [209]: df
Out[209]:
   year  month     capital       sales      income      profit         debt    date
0  2000      6 -19250379.0  37924704.0  -4348337.0   2571738.0  192842551.0 2000-06
1  2000     12 -68357153.0  27870187.0 -49074146.0 -20764204.0  190848380.0 2000-12
2  2001      6 -65048960.0  30529435.0  -1172803.0   2000427.0  197383572.0 2001-06
3  2001     12 -90129943.0  17135480.0 -24208501.0  -1012230.0  191464941.0 2001-12
4  2002      6  14671980.0  31377347.0   2188125.0   3660938.0  101355088.0 2002-06

Solution 2:[2]

You can also pass series to PeriodIndex, so this works:

df['date'] = pd.PeriodIndex(
    year=df['year'],
    month=df['month'],
    freq='M',
)

If you specifically want a Timestamp instead of a Period, you can pass a dict to to_datetime() and set the day to 1:

df['date'] = pd.to_datetime({
    'year': df['year'],
    'month': df['month'],
    'day': 1,
})

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2