'How to fix ParserError: year 0 is out of range: 0000-00-00 with Python Pandas to_datetime method

I am trying to convert a column "travel_start" to a datetime object.

Dashboard["travel_start"] = pd.to_datetime(Dashboard["travel_start"])

But I get the following error:

ParserError: year 0 is out of range: 0000-00-00

When I tried to filter the column "travel_start" from the column in the dataframe. I see the dates below:

4922     0000-00-00
5592     0000-00-00
6647     0000-00-00
6796     0000-00-00
6941     0000-00-00
8223     0000-00-00
8391     0000-00-00
10137    0000-00-00
10197    0000-00-00
10744    0000-00-00
11128    0000-00-00
12304    0000-00-00
12511    0000-00-00
13307    0000-00-00
13681    0000-00-00
14381    0000-00-00
15160    0000-00-00
16330    0000-00-00
17734    0000-00-00
18148    0000-00-00
19389    0000-00-00
19643    0000-00-00
20372    0000-00-00
21412    0000-00-00
21757    0000-00-00
21879    0000-00-00
21978    0000-00-00
23216    0000-00-00
24375    0000-00-00
25660    0000-00-00

A count on this shows that they are 56 occurrence of this and I don't think it is smart to use the errors to cast it to NaT. What do you think I could change them to? or do?

Please your input is highly appreciated. thanks



Solution 1:[1]

Pandas uses the pandas.Timestamp type to store date with time, instead pythons datetime.datetime.

The min/max values for TimeStamp are:

  • pd.Timestamp.min # return Timestamp('1677-09-21 00:12:43.145224193')
  • pd.Timestamp.max # return Timestamp('2262-04-11 23:47:16.854775807')

In your case we can clearly see that dates for those rows are simply missing / are unknown.

As @jezrael suggested use pd.to_datetime(Dashboard["travel_start"],errors='coerce') and treat all NaT as unknown.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Waldemar Walo