'Parsing one column with 5 didgits integers as date format while reading excel file

For an excel file (download from here):

enter image description here

df <- openxlsx::read.xlsx('sample_data.xlsx', sheet='Sheet1', colNames=TRUE)
df

Output:

   date  value
1 43861   5.70
2 43890 -13.89
3 43921 -49.68
4 43951 -62.81

I try to convert date column to a normal date format:

> df %>% 
+   mutate(date=as.Date(date, origin = "1970-01-01"))
        date  value
1 2090-02-01   5.70
2 2090-03-02 -13.89
3 2090-04-02 -49.68
4 2090-05-02 -62.81

> df %>% 
+   mutate(date=as.Date(date, origin = "1910-01-01"))
        date  value
1 2030-02-01   5.70
2 2030-03-02 -13.89
3 2030-04-02 -49.68
4 2030-05-02 -62.81

I tested with 1970-01-01 and 1910-01-01 as value for origin parameter, the dates in the output seems incorrect (43861 has been convert to 2090-02-01 and 2030-02-01, which should be 2020-01-31).



Solution 1:[1]

origin has to be inside the as.Date call.

df %>%
  mutate(date = as.Date(date, origin = "1899-12-30"))
#>         date  value
#> 1 2020-01-31   5.70
#> 2 2020-02-29 -13.89
#> 3 2020-03-31 -49.68
#> 4 2020-04-30 -62.81

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Allan Cameron