'Replace NA only within specific range of time series object
I have the following data stored as zoo object:
A B C
2017-05-31 NA NA 3.1
2017-06-30 2.5 2.4 3.2
2017-07-31 2.5 NA 3.3
2017-08-31 2.6 NA 3.4
2017-09-30 2.8 2.5 3.6
2017-10-31 2.3 NA 3.4
2017-11-30 2.5 NA 3.2
2017-12-31 2.7 2.6 2.9
2018-01-31 2.8 NA 3.0
2018-02-28 2.6 NA NA
2018-03-31 NA NA NA
You can reproduce this zoo object as follows:
zoo <- data.frame(A=c(NA, 2.5, 2.5, 2.6, 2.8, 2.3, 2.5, 2.7, 2.8, 2.6, NA), B=c(NA, 2.4, NA, NA, 2.5, NA, NA, 2.6, NA, NA, NA), C=c(3.1, 3.2, 3.3, 3.4, 3.6, 3.4, 3.2, 2.9, 3.0, NA, NA), dates=c('2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31', '2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31', '2018-01-31', '2018-02-28', '2018-03-31'))
zoo <- as.zoo(df, order.by=as.Date(df$dates, format="%Y-%m-%d"))[,-4]
Columns A & C have monthly observations, column B has quarterly observations. Columns A & B have some NAs in the beginning, all have some NAs in the end. I would like to only fill intermittent NAs (using the previous value), but not those in the beginning or in the end. So, in my example, only the NAs in column B between 2017-06-30 and 2017-12-31 should be filled using the previous value. The result should look like this:
A B C
2017-05-31 NA NA 3.1
2017-06-30 2.5 2.4 3.2
2017-07-31 2.5 2.4 3.3
2017-08-31 2.6 2.4 3.4
2017-09-30 2.8 2.5 3.6
2017-10-31 2.3 2.5 3.4
2017-11-30 2.5 2.5 3.2
2017-12-31 2.7 2.6 2.9
2018-01-31 2.8 NA 3.0
2018-02-28 2.6 NA NA
2018-03-31 NA NA NA
Please note that my data my always look different and may have intermittent NAs, but also NAs of different length in the beginning and in the end. I therefore need a generic solution.
I have achieved the desired result with the following code, but it is very cumbersome and I'm sure there is a more elegant solution.
min <- sapply(zoo, function(col) min(which(!is.na(col))))
max <- sapply(zoo, function(col) max(which(!is.na(col))))
k <- ncol(zoo)
l <- length(min)
for (i in 1:l){
orig <- colnames(zoo)[i]
temp_repl <- na.locf(zoo[min[1]:max[i],i])
temp_zoo <- rbind(zoo[1:min[i]-1,i], temp_repl, zoo[(1+max[i]):nrow(zoo),i])
zoo <- cbind(zoo,temp_zoo)
colnames(zoo)[i] <- paste(orig, ", orig", sep="")
colnames(zoo)[k+i] <- orig
i+1
}
zoo <- zoo[,(k+1):ncol(zoo)]
Solution 1:[1]
This isn't that elegant either, but using dplyr::bind_rows()
and tidyr::fill()
might make it a little easier to follow:
df <- data.frame(A=c(NA, 2.5, 2.5, 2.6, 2.8, 2.3, 2.5, 2.7, 2.8, 2.6, NA), B=c(NA, 2.4, NA, NA, 2.5, NA, NA, 2.6, NA, NA, NA), C=c(3.1, 3.2, 3.3, 3.4, 3.6, 3.4, 3.2, 2.9, 3.0, NA, NA), dates=c('2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31', '2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31', '2018-01-31', '2018-02-28', '2018-03-31'))
df$dates <- as.Date(df$dates)
min_date='2017-06-30'
max_date='2017-12-31'
df_new <- dplyr::bind_rows(
df[df$dates < min_date,],
tidyr::fill(df[df$dates >= min_date & df$dates <= max_date,], A:C),
df[df$dates > max_date,]
)
df_new
A B C dates
1 NA NA 3.1 2017-05-31
2 2.5 2.4 3.2 2017-06-30
3 2.5 2.4 3.3 2017-07-31
4 2.6 2.4 3.4 2017-08-31
5 2.8 2.5 3.6 2017-09-30
6 2.3 2.5 3.4 2017-10-31
7 2.5 2.5 3.2 2017-11-30
8 2.7 2.6 2.9 2017-12-31
9 2.8 NA 3.0 2018-01-31
10 2.6 NA NA 2018-02-28
11 NA NA NA 2018-03-31
zoo
also has the na.locf()
function, though it converts all the columns to characters when applied to the entire data frame.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | sbha |