'Remove underscore and number at the end of string

I am working with a dataset that has column with some underscores. There is a patter to it but they are different patterns, as shown below

  ID     Col1
  1029   ap_analog
  2334   critical_1_mm_1
  2334   transpose_2_mm_2
  9877   public_1_yes_0_no_1
  9877   public_1_yes_0_no_2
  1333   Lateral_mm
  1333   Lateral_mm_1
  1333   Lateral_mm_2
  1333   Lateral_mm_3
  1333   ap_mm_axial
  1333   ap_mm_axial_1
  1333   ap_mm_axial_2
  1333   ap_mm_axial_3
  9876   central_star_six_mm
  9876   central_star_six_mm_1
  9876   central_star_six_mm_2
  9876   central_star_six_mm_3

I just like to separate the numbers from the string with a final dataset like this

  ID     Col1                     Index
  1029   ap_analog                0
  2334   critical_1_mm            1
  2334   transpose_2_mm           2
  9877   public_1_yes_0_no        1
  9877   public_1_yes_0_no        2
  1333   Lateral_mm               0
  1333   Lateral_mm               1
  1333   Lateral_mm               2  
  1333   Lateral_mm               3
  1333   ap_mm_axial              0
  1333   ap_mm_axial              1
  1333   ap_mm_axial              2
  1333   ap_mm_axial              3
  9876   central_star_six_mm      0
  9876   central_star_six_mm      1
  9876   central_star_six_mm      2
  9876   central_star_six_mm      3

Right now I am doing this very inefficiently. Something like this

df1$index <- df1$Col1

for(i in 1:3) {
  df1$index <-  regmatches(df1$index,gregexpr("(?<=_).*",df1$index,perl=TRUE))
}

df1$index[ which(df1$index == "character(0)")] <- 0

I would appreciate any suggestions to improve on this.



Solution 1:[1]

One way using dplyr and stringr :

We can extract the Index value which is the number at the end of Col1, replace the NA values with 0. We can remove the last digit from Col1.

library(dplyr)
library(stringr)

library(dplyr)
df %>%
  mutate(Index = str_extract(Col1, '\\d+$'), 
         Index = replace(Index, is.na(Index), 0), 
         Col1 = sub('_\\d+$', '', Col1))

#    ID                Col1 Index
#1  1029           ap_analog     0
#2  2334       critical_1_mm     1
#3  2334      transpose_2_mm     2
#4  9877   public_1_yes_0_no     1
#5  9877   public_1_yes_0_no     2
#6  1333          Lateral_mm     0
#7  1333          Lateral_mm     1
#8  1333          Lateral_mm     2
#9  1333          Lateral_mm     3
#10 1333         ap_mm_axial     0
#11 1333         ap_mm_axial     1
#12 1333         ap_mm_axial     2
#13 1333         ap_mm_axial     3
#14 9876 central_star_six_mm     0
#15 9876 central_star_six_mm     1
#16 9876 central_star_six_mm     2
#17 9876 central_star_six_mm     3

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1