'How to detect TIME when reading from an excel sheet using R
The issue is that when I read from an excel sheet into R using read.xlsx
from openxlsx
package, the TIME
column is converted into a fraction.
Here is an example,
dfin <-
DATE TIME
15/02/2015 8:00 AM
22/01/2014 10:00 PM
library(openxlsx)
test <- read.xlsx("dfin.xlsx", sheet = 1,
detectDates=TRUE, skipEmptyRows = TRUE,
skipEmptyCols = TRUE, rows = NULL, cols = NULL, check.names = FALSE,
namedRegion = NULL, na.strings = "NA", fillMergedCells = FALSE)
Output:
DATE TIME
2015-02-15 0.3333333
2014-01-22 0.9166667
I am not sure why it does that and whether there is a way to fix that as I need to use both DATE and TIME to do some calculations.
Solution 1:[1]
R doesn't have a time format really, so I suggest reading it in using read_excel
, which automatically detects the column type. This will turn it into a date-time format with a random date, which you can then remove, before converting it into a proper timestamp.
library(readxl)
library(lubridate)
test <- read_excel('dfin.xlsx',trim_ws = TRUE) %>%
#return the TIME column to the way it is written in Excel
mutate(TIME = as.character(gsub(".* ","",TIME)),
#format the date column
DATE = dmy(DATE),
#turn it into a timestamp
TIMESTAMP = as.POSIXct(paste(DATE,TIME)))
Solution 2:[2]
My first guess was that read.xlsx()
is trying to guess the date-looking columns in .xlsx while it reads the file, and weirdly converts the time from %I:%M %p
format into fractions of 24 hours (because e.g. 0.3333333 * 24 = 7.999999
which is exactly 8.0). But latter I noticed that if I change the parameter detectDates
into FALSE
nothing really changes - it outputs the same data frame. So it guess nothing, it just reads the TIME
as it is.
If you try to edit 10:00 PM
within Excel workbook, you'll see that it is really stored as 22:00:00
. So why at the end it is represented as a fraction of 24
?! I don't know, and I hope someone can explain that.
@Randall approach is really good alternative comparing to openxlsx::read.xlsx()
. Note that read_xlsx()
recognizes TIME
as %H:%M:%S
, and converts it into the dummy POSIXct/POSIXt
object, i.e. 1899-12-31 08:00:00
and 1899-12-31 22:00:00
.
Surprisingly, read_xlsx()
doesn't recognize that DATE
has %d-%m-%Y
format, and interpret it as a character
. Meaning that we need to convert both variables into appropriate format in order to obtain desired output.
I don't think we need to use gsub
to get the 12-hour clock time from POSIXct
object, it is much easire to use format
for this purpose. And conversion of DATE
from %d-%m-%Y
into %Y-%m-%d
format is even an easier task:
library(dplyr)
library(readxl)
read_xlsx("myfile.xlsx") |>
mutate(
DATE = as.Date(DATE, "%d/%m/%Y"),
TIME = format(TIME, "%I:%M %p") # “That’s what I do: I drink and I know things.”
)
Which produces:
# A tibble: 2 x 2
DATE TIME
<date> <chr>
1 2015-02-15 08:00 AM
2 2014-01-22 10:00 PM
Solution 3:[3]
I faced the same problem and solved it as follows - quick and dirty:
Read the data using
readxl:read_excel()
.Without loss of generality, we just look at obtaining
17:20
from"0.72222222222222"
instead of a column containing the time data. Be aware of reading values out of an excel file could have unwanted types, but for calculations we need numerics.
x <- as.numeric("0.72222222222222")*24
minutes <- round((x %% 1)*60, digits = 0)
hours <- round(x - minutes/60, digits = 0)
if (minutes < 10){ #if minutes is a single digit need to insert a preceding 0
minutes= paste0("0",minutes)
}
paste0(hours, ":", minutes)
#17:20
Solution 4:[4]
Read the time values as "date":
test <- read_excel('dfin.xlsx', col_types = c("date"), ...)
Then trim it with
substr()
to get the time:test <- substr(test,12,16)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Randall Helms |
Solution 2 | |
Solution 3 | Community |
Solution 4 | Suraj Rao |