'RStudio: Selecting the column with the latest available data from a dataframe

I am trying to extract data from the World Bank and import it into RStudio for a regression analysis. The data can be found here and as you can see, the online table shows the latest available data.

However, the downloadable CSV/excel files have columns for every year between 1960 and 2021.

What I want to do now is to create a new data frame where I have the latest data for each of the countries – basically, I want to recreate what can be found on World Bank's site online already.

I have googled around for quite a while but I could not find anything that fits my specific problem.

What I did find were some ways on how to select the newest data from a data frame, but all these solutions required a date column, which my dataset does not have – I have columns for (1960:2021).

Example: Take this data frame:

df <- data.frame(c(1:10),c(1:4,NA))
df
   c.1.10. c.1.4..NA.
1        1          1
2        2          2
3        3          3
4        4          4
5        5         NA
6        6          1
7        7          2
8        8          3
9        9          4
10      10         NA

What I want to do is replace the NA-values with the values from the previous column, so with the 5 and the 10. How can I do that?

I'm thankful for any help!



Solution 1:[1]

You can use the World Bank API in R:

library(wbstats)
library(tidyverse)
dat <-  wb_data( indicator = "SP.DYN.TFRT.IN", start_date = 1973, end_date = 2020)

dat <- dat %>% 
  select(1:5) %>% 
  na.omit() %>% 
  arrange(country, date) %>% 
  group_by(country) %>% 
  slice_tail(n=1)
head(dat)  
#> # A tibble: 6 × 5
#> # Groups:   country [6]
#>   iso2c iso3c country              date SP.DYN.TFRT.IN
#>   <chr> <chr> <chr>               <dbl>          <dbl>
#> 1 AF    AFG   Afghanistan          2020           4.18
#> 2 AL    ALB   Albania              2020           1.58
#> 3 DZ    DZA   Algeria              2020           2.94
#> 4 AD    AND   Andorra              2010           1.27
#> 5 AO    AGO   Angola               2020           5.37
#> 6 AG    ATG   Antigua and Barbuda  2020           1.98
table(dat$date)
#> 
#> 2002 2003 2010 2011 2012 2015 2020 
#>    2    1    1    2    1    1  200

Created on 2022-05-12 by the reprex package (v2.0.1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 DaveArmstrong