'How to fix ParserError: year 0 is out of range: 0000-00-00 with Python Pandas to_datetime method
I am trying to convert a column "travel_start" to a datetime object.
Dashboard["travel_start"] = pd.to_datetime(Dashboard["travel_start"])
But I get the following error:
ParserError: year 0 is out of range: 0000-00-00
When I tried to filter the column "travel_start" from the column in the dataframe. I see the dates below:
4922 0000-00-00
5592 0000-00-00
6647 0000-00-00
6796 0000-00-00
6941 0000-00-00
8223 0000-00-00
8391 0000-00-00
10137 0000-00-00
10197 0000-00-00
10744 0000-00-00
11128 0000-00-00
12304 0000-00-00
12511 0000-00-00
13307 0000-00-00
13681 0000-00-00
14381 0000-00-00
15160 0000-00-00
16330 0000-00-00
17734 0000-00-00
18148 0000-00-00
19389 0000-00-00
19643 0000-00-00
20372 0000-00-00
21412 0000-00-00
21757 0000-00-00
21879 0000-00-00
21978 0000-00-00
23216 0000-00-00
24375 0000-00-00
25660 0000-00-00
A count on this shows that they are 56 occurrence of this and I don't think it is smart to use the errors to cast it to NaT. What do you think I could change them to? or do?
Please your input is highly appreciated. thanks
Solution 1:[1]
Pandas uses the pandas.Timestamp type to store date with time, instead pythons datetime.datetime.
The min/max values for TimeStamp are:
pd.Timestamp.min # return Timestamp('1677-09-21 00:12:43.145224193')
pd.Timestamp.max # return Timestamp('2262-04-11 23:47:16.854775807')
In your case we can clearly see that dates for those rows are simply missing / are unknown.
As @jezrael suggested use pd.to_datetime(Dashboard["travel_start"],errors='coerce')
and treat all NaT as unknown.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Waldemar Walo |