'How to add rows with condition and binary variable in panel dataset on R

I am constructing a panel dataset from an original data which contains for each row, a company (name) and its sales across 10 years.

In concrete, it loos like this: original dataset

The panel dataset I am building has to look like this:

enter image description here

So far, I have the panel dataset with all companies but only with the years when they have sales.

For each of the company that has stopped sales after showing positive sales (there is a "-" in year y after sales in years x, x+1) I need to add a row copying the info about the company (the whole row: name, sales, year) and add a 1 in the column "country exit". In the example above, I would have to do what has been done for company D in the last row in the second picture.

How can I avoid doing that manually on R studio, as there are approximately 250 companies with this case in the dataset?

Thanks

I've tried some functions on r but unable to perform it in a simple way and make it easy to do for each data.



Solution 1:[1]

This is an example using tidyverse. Let's say this is your dataframe:

df <- structure(list(company_name = c("Company A", "Company B", "Company C", 
                                "Company D"), `2004` = c(NA, NA, NA, NA), `2005` = c(NA, NA, 
                                                                                     NA, NA), `2006` = c(NA, NA, NA, NA), `2007` = c(NA, NA, NA, NA
                                                                                     ), `2008` = c(NA, NA, NA, NA), `2009` = c(NA, NA, NA, NA), `2010` = c(NA, 
                                                                                                                                                           NA, NA, NA), `2011` = c(NA, NA, NA, NA), `2012` = c(0, NA, 0.2, 
                                                                                                                                                                                                               0.1), `2013` = c(0, 0.1, 0.3, NA), `2014` = c(0, 0.1, 0.5, NA
                                                                                                                                                                                                               )), class = "data.frame", row.names = c(NA, -4L))

> df
  company_name 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
1    Company A   NA   NA   NA   NA   NA   NA   NA   NA  0.0  0.0  0.0
2    Company B   NA   NA   NA   NA   NA   NA   NA   NA   NA  0.1  0.1
3    Company C   NA   NA   NA   NA   NA   NA   NA   NA  0.2  0.3  0.5
4    Company D   NA   NA   NA   NA   NA   NA   NA   NA  0.1   NA   NA

First thing to do is applying pivot_longer on all columns but company name to create the year column:

library(tidyverse)

df2 <- df %>% pivot_longer(-company_name, names_to = 'year', values_to = 'sales')
> df2
# A tibble: 44 x 3
   company_name year  sales
   <chr>        <chr> <dbl>
 1 Company A    2004     NA
 2 Company A    2005     NA
 3 Company A    2006     NA
 4 Company A    2007     NA
 5 Company A    2008     NA
 6 Company A    2009     NA
 7 Company A    2010     NA
 8 Company A    2011     NA
 9 Company A    2012      0
10 Company A    2013      0
# ... with 34 more rows

Make sure the dataframe is arranged correctly (by company_name and year), group_by company name, and for each company check: if sales in a row equals NA, but in the previous row it's larger than 0, put 1 in the new column Country Exit:

df2 <- df2 %>%
  arrange(company_name, year) %>%
  group_by(company_name) %>%
  mutate(`Country Exit` = ifelse(is.na(sales)&lag(sales) != 0, 1, 0))

> df2
# A tibble: 44 x 4
# Groups:   company_name [4]
   company_name year  sales `Country Exit`
   <chr>        <chr> <dbl>          <dbl>
 1 Company A    2004     NA             NA
 2 Company A    2005     NA             NA
 3 Company A    2006     NA             NA
 4 Company A    2007     NA             NA
 5 Company A    2008     NA             NA
 6 Company A    2009     NA             NA
 7 Company A    2010     NA             NA
 8 Company A    2011     NA             NA
 9 Company A    2012      0              0
10 Company A    2013      0              0
# ... with 34 more rows

And to get the cleaner output, like the one you are mentioning, just remove NAs from Country Exit:

df2 %>% filter(!is.na(`Country Exit`))
# A tibble: 10 x 4
# Groups:   company_name [4]
   company_name year  sales `Country Exit`
   <chr>        <chr> <dbl>          <dbl>
 1 Company A    2012    0                0
 2 Company A    2013    0                0
 3 Company A    2014    0                0
 4 Company B    2013    0.1              0
 5 Company B    2014    0.1              0
 6 Company C    2012    0.2              0
 7 Company C    2013    0.3              0
 8 Company C    2014    0.5              0
 9 Company D    2012    0.1              0
10 Company D    2013   NA                1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 denisafonin