'Pandas: how can I generate "year-month" format column (period)?
In [20]: df.head()
Out[20]:
year month capital sales income profit debt
0 2000 6 -19250379.0 37924704.0 -4348337.0 2571738.0 192842551.0
1 2000 12 -68357153.0 27870187.0 -49074146.0 -20764204.0 190848380.0
2 2001 6 -65048960.0 30529435.0 -1172803.0 2000427.0 197383572.0
3 2001 12 -90129943.0 17135480.0 -24208501.0 -1012230.0 191464941.0
4 2002 6 14671980.0 31377347.0 2188125.0 3660938.0 101355088.0
What I tried was:
df['date'] = pd.to_datetime(df['year']*10000 + df['month']*100, format="%Y%m")
but it occurred an error:
In [21]: df['date'] = pd.to_datetime(df['year']*10000 + df['month']*100, format="%Y%m"
...: )
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-21-31bfca8c5941> in <module>()
----> 1 df['date'] = pd.to_datetime(df['year']*10000 + df['month']*100, format="%Y%m")
~/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin)
507 elif isinstance(arg, ABCSeries):
508 from pandas import Series
--> 509 values = _convert_listlike(arg._values, False, format)
510 result = Series(values, index=arg.index, name=arg.name)
511 elif isinstance(arg, (ABCDataFrame, MutableMapping)):
~/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in _convert_listlike(arg, box, format, name, tz)
412 try:
413 result = tslib.array_strptime(arg, format, exact=exact,
--> 414 errors=errors)
415 except tslib.OutOfBoundsDatetime:
416 if errors == 'raise':
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_strptime (pandas/_libs/tslib.c:63753)()
TypeError: 'int' object is unsliceable
I think this is because 'day' is missed.
How can I do this?
Solution 1:[1]
In [223]: df['date'] = pd.to_datetime(df[['year','month']].assign(day=1)).dt.to_period('M')
In [224]: df
Out[224]:
year month capital sales income profit debt date
0 2000 6 -19250379.0 37924704.0 -4348337.0 2571738.0 192842551.0 2000-06
1 2000 12 -68357153.0 27870187.0 -49074146.0 -20764204.0 190848380.0 2000-12
2 2001 6 -65048960.0 30529435.0 -1172803.0 2000427.0 197383572.0 2001-06
3 2001 12 -90129943.0 17135480.0 -24208501.0 -1012230.0 191464941.0 2001-12
4 2002 6 14671980.0 31377347.0 2188125.0 3660938.0 101355088.0 2002-06
or
In [208]: df['date'] = pd.PeriodIndex(pd.to_datetime(df[['year','month']].assign(day=1)),
freq='M')
In [209]: df
Out[209]:
year month capital sales income profit debt date
0 2000 6 -19250379.0 37924704.0 -4348337.0 2571738.0 192842551.0 2000-06
1 2000 12 -68357153.0 27870187.0 -49074146.0 -20764204.0 190848380.0 2000-12
2 2001 6 -65048960.0 30529435.0 -1172803.0 2000427.0 197383572.0 2001-06
3 2001 12 -90129943.0 17135480.0 -24208501.0 -1012230.0 191464941.0 2001-12
4 2002 6 14671980.0 31377347.0 2188125.0 3660938.0 101355088.0 2002-06
Solution 2:[2]
You can also pass series to PeriodIndex
, so this works:
df['date'] = pd.PeriodIndex(
year=df['year'],
month=df['month'],
freq='M',
)
If you specifically want a Timestamp
instead of a Period
, you can pass a dict
to to_datetime()
and set the day to 1
:
df['date'] = pd.to_datetime({
'year': df['year'],
'month': df['month'],
'day': 1,
})
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 |