'Remove underscore and number at the end of string
I am working with a dataset that has column with some underscores. There is a patter to it but they are different patterns, as shown below
ID Col1
1029 ap_analog
2334 critical_1_mm_1
2334 transpose_2_mm_2
9877 public_1_yes_0_no_1
9877 public_1_yes_0_no_2
1333 Lateral_mm
1333 Lateral_mm_1
1333 Lateral_mm_2
1333 Lateral_mm_3
1333 ap_mm_axial
1333 ap_mm_axial_1
1333 ap_mm_axial_2
1333 ap_mm_axial_3
9876 central_star_six_mm
9876 central_star_six_mm_1
9876 central_star_six_mm_2
9876 central_star_six_mm_3
I just like to separate the numbers from the string with a final dataset like this
ID Col1 Index
1029 ap_analog 0
2334 critical_1_mm 1
2334 transpose_2_mm 2
9877 public_1_yes_0_no 1
9877 public_1_yes_0_no 2
1333 Lateral_mm 0
1333 Lateral_mm 1
1333 Lateral_mm 2
1333 Lateral_mm 3
1333 ap_mm_axial 0
1333 ap_mm_axial 1
1333 ap_mm_axial 2
1333 ap_mm_axial 3
9876 central_star_six_mm 0
9876 central_star_six_mm 1
9876 central_star_six_mm 2
9876 central_star_six_mm 3
Right now I am doing this very inefficiently. Something like this
df1$index <- df1$Col1
for(i in 1:3) {
df1$index <- regmatches(df1$index,gregexpr("(?<=_).*",df1$index,perl=TRUE))
}
df1$index[ which(df1$index == "character(0)")] <- 0
I would appreciate any suggestions to improve on this.
Solution 1:[1]
One way using dplyr
and stringr
:
We can extract the Index
value which is the number at the end of Col1
, replace the NA
values with 0. We can remove the last digit from Col1
.
library(dplyr)
library(stringr)
library(dplyr)
df %>%
mutate(Index = str_extract(Col1, '\\d+$'),
Index = replace(Index, is.na(Index), 0),
Col1 = sub('_\\d+$', '', Col1))
# ID Col1 Index
#1 1029 ap_analog 0
#2 2334 critical_1_mm 1
#3 2334 transpose_2_mm 2
#4 9877 public_1_yes_0_no 1
#5 9877 public_1_yes_0_no 2
#6 1333 Lateral_mm 0
#7 1333 Lateral_mm 1
#8 1333 Lateral_mm 2
#9 1333 Lateral_mm 3
#10 1333 ap_mm_axial 0
#11 1333 ap_mm_axial 1
#12 1333 ap_mm_axial 2
#13 1333 ap_mm_axial 3
#14 9876 central_star_six_mm 0
#15 9876 central_star_six_mm 1
#16 9876 central_star_six_mm 2
#17 9876 central_star_six_mm 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |