'How to preserve timezone when parsing date/time strings with strptime()?
I have a CSV dumpfile from a Blackberry IPD backup, created using IPDDump.
The date/time strings in here look something like this
(where EST
is an Australian time-zone):
Tue Jun 22 07:46:22 EST 2010
I need to be able to parse this date in Python. At first, I tried to use the strptime()
function from datettime.
>>> datetime.datetime.strptime('Tue Jun 22 12:10:20 2010 EST', '%a %b %d %H:%M:%S %Y %Z')
However, for some reason, the datetime
object that comes back doesn't seem to have any tzinfo
associated with it.
I did read on this page that apparently datetime.strptime
silently discards tzinfo
, however, I checked the documentation, and I can't find anything to that effect documented here.
Is there any way to get strptime()
to play nicely with timezones?
Solution 1:[1]
The datetime
module documentation says:
Return a datetime corresponding to date_string, parsed according to format. This is equivalent to
datetime(*(time.strptime(date_string, format)[0:6]))
.
See that [0:6]
? That gets you (year, month, day, hour, minute, second)
. Nothing else. No mention of timezones.
Interestingly, [Win XP SP2, Python 2.6, 2.7] passing your example to time.strptime
doesn't work but if you strip off the " %Z" and the " EST" it does work. Also using "UTC" or "GMT" instead of "EST" works. "PST" and "MEZ" don't work. Puzzling.
It's worth noting this has been updated as of version 3.2 and the same documentation now also states the following:
When the %z directive is provided to the strptime() method, an aware datetime object will be produced. The tzinfo of the result will be set to a timezone instance.
Note that this doesn't work with %Z, so the case is important. See the following example:
In [1]: from datetime import datetime
In [2]: start_time = datetime.strptime('2018-04-18-17-04-30-AEST','%Y-%m-%d-%H-%M-%S-%Z')
In [3]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: None
In [4]: start_time = datetime.strptime('2018-04-18-17-04-30-+1000','%Y-%m-%d-%H-%M-%S-%z')
In [5]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: UTC+10:00
Solution 2:[2]
I recommend using python-dateutil. Its parser has been able to parse every date format I've thrown at it so far.
>>> from dateutil import parser
>>> parser.parse("Tue Jun 22 07:46:22 EST 2010")
datetime.datetime(2010, 6, 22, 7, 46, 22, tzinfo=tzlocal())
>>> parser.parse("Fri, 11 Nov 2011 03:18:09 -0400")
datetime.datetime(2011, 11, 11, 3, 18, 9, tzinfo=tzoffset(None, -14400))
>>> parser.parse("Sun")
datetime.datetime(2011, 12, 18, 0, 0)
>>> parser.parse("10-11-08")
datetime.datetime(2008, 10, 11, 0, 0)
and so on. No dealing with strptime()
format nonsense... just throw a date at it and it Does The Right Thing.
Solution 3:[3]
Since strptime
returns a datetime object which has tzinfo
attribute, We can simply replace it with desired timezone.
>>> import datetime
>>> date_time_str = '2018-06-29 08:15:27.243860'
>>> date_time_obj = datetime.datetime.strptime(date_time_str, '%Y-%m-%d %H:%M:%S.%f').replace(tzinfo=datetime.timezone.utc)
>>> date_time_obj.tzname()
'UTC'
Solution 4:[4]
Your time string is similar to the time format in rfc 2822 (date format in email, http headers). You could parse it using only stdlib:
>>> from email.utils import parsedate_tz
>>> parsedate_tz('Tue Jun 22 07:46:22 EST 2010')
(2010, 6, 22, 7, 46, 22, 0, 1, -1, -18000)
See solutions that yield timezone-aware datetime objects for various Python versions: parsing date with timezone from an email.
In this format, EST
is semantically equivalent to -0500
. Though, in general, a timezone abbreviation is not enough, to identify a timezone uniquely.
Solution 5:[5]
Ran into this exact problem.
What I ended up doing:
# starting with date string
sdt = "20190901"
std_format = '%Y%m%d'
# create naive datetime object
from datetime import datetime
dt = datetime.strptime(sdt, sdt_format)
# extract the relevant date time items
dt_formatters = ['%Y','%m','%d']
dt_vals = tuple(map(lambda formatter: int(datetime.strftime(dt,formatter)), dt_formatters))
# set timezone
import pendulum
tz = pendulum.timezone('utc')
dt_tz = datetime(*dt_vals,tzinfo=tz)
Solution 6:[6]
As an extension to Joe Shaw's answer, dateutil's parser offers the possibility to supply a mapping of time zone name abbreviations to timezone objects derived from IANA time zone names.
import dateutil
tzdict = {'EST': dateutil.tz.gettz('America/New_York'),
'EDT': dateutil.tz.gettz('America/New_York')}
dt = dateutil.parser.parse("Tue Jun 22 07:46:22 EST 2010", tzinfos=tzdict)
print(dt)
# 2010-06-22 07:46:22-04:00
print(repr(dt))
# datetime.datetime(2010, 6, 22, 7, 46, 22, tzinfo=tzfile('US/Eastern'))
The advantage over a fixed UTC offset is that time zone rules (e.g. DST transitions) will be taken into account if you perform any timedelta arithmetic with the obtained datetime object.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | mikey |
Solution 2 | mkrieger1 |
Solution 3 | Surya Teja |
Solution 4 | Community |
Solution 5 | |
Solution 6 | FObersteiner |