'RStudio: Selecting the column with the latest available data from a dataframe
I am trying to extract data from the World Bank and import it into RStudio for a regression analysis. The data can be found here and as you can see, the online table shows the latest available data.
However, the downloadable CSV/excel files have columns for every year between 1960 and 2021.
What I want to do now is to create a new data frame where I have the latest data for each of the countries – basically, I want to recreate what can be found on World Bank's site online already.
I have googled around for quite a while but I could not find anything that fits my specific problem.
What I did find were some ways on how to select the newest data from a data frame, but all these solutions required a date column, which my dataset does not have – I have columns for (1960:2021).
Example: Take this data frame:
df <- data.frame(c(1:10),c(1:4,NA))
df
c.1.10. c.1.4..NA.
1 1 1
2 2 2
3 3 3
4 4 4
5 5 NA
6 6 1
7 7 2
8 8 3
9 9 4
10 10 NA
What I want to do is replace the NA-values with the values from the previous column, so with the 5 and the 10. How can I do that?
I'm thankful for any help!
Solution 1:[1]
You can use the World Bank API in R:
library(wbstats)
library(tidyverse)
dat <- wb_data( indicator = "SP.DYN.TFRT.IN", start_date = 1973, end_date = 2020)
dat <- dat %>%
select(1:5) %>%
na.omit() %>%
arrange(country, date) %>%
group_by(country) %>%
slice_tail(n=1)
head(dat)
#> # A tibble: 6 × 5
#> # Groups: country [6]
#> iso2c iso3c country date SP.DYN.TFRT.IN
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 AF AFG Afghanistan 2020 4.18
#> 2 AL ALB Albania 2020 1.58
#> 3 DZ DZA Algeria 2020 2.94
#> 4 AD AND Andorra 2010 1.27
#> 5 AO AGO Angola 2020 5.37
#> 6 AG ATG Antigua and Barbuda 2020 1.98
table(dat$date)
#>
#> 2002 2003 2010 2011 2012 2015 2020
#> 2 1 1 2 1 1 200
Created on 2022-05-12 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | DaveArmstrong |