'R replace string in df with partial match in a list

I have a dataframe (df) in R and I want to create a new column (city1_n) that contains a line stored in the list key whenever there is a partial match between city1 and key.

Below I have created a little example that should help to visualize my problem.

> dput(df)
structure(list(Country = c("USA", "France", "Italy", "Spain", 
"Mexico"), City1 = c("Los angeles", "Paris", "Rome", "Madrid", 
"Cancun"), City2 = c("New York", "Lyon", "Pisa", "Barcelona", 
"San Cristobal de las Casas")), class = "data.frame", row.names = c(NA, 
-5L))

> dput(key)
list("Los angeles California", "Paris Île-de-France", "Rome Lazio", 
    "Madrid Comunidad de Madrid ", "Cancun Quintana Roo")

enter image description here

Result:

enter image description here

I am looking to solve this in R or Unix.



Solution 1:[1]

Use fuzzyjoin::fuzzyjoin:

fuzzyjoin::fuzzy_left_join(df, data.frame(key), by = c("City1" = "key"), match_fun = \(x,y) str_detect(y, x))

  Country       City1                      City2                         key
1     USA Los angeles                   New York      Los angeles California
2  France       Paris                       Lyon         Paris Île-de-France
3   Italy        Rome                       Pisa                  Rome Lazio
4   Spain      Madrid                  Barcelona Madrid Comunidad de Madrid 
5  Mexico      Cancun San Cristobal de las Casas         Cancun Quintana Roo

data

df <- structure(list(Country = c("USA", "France", "Italy", "Spain", 
                           "Mexico"), City1 = c("Los angeles", "Paris", "Rome", "Madrid", 
                                                "Cancun"), City2 = c("New York", "Lyon", "Pisa", "Barcelona", 
                                                                     "San Cristobal de las Casas")), class = "data.frame", row.names = c(NA, 
                                                                                                                                         -5L))

key <- c("Los angeles California", "Paris Île-de-France", "Rome Lazio", 
     "Madrid Comunidad de Madrid ", "Cancun Quintana Roo")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Maël