'Fuzzy Matching in SnowFlake like EDIT_DISTANCE_SIMILARITY
Do we have any function for name fuzzy matching like we have UTL_MATCHING.EDIT_DISTANCE_SIMILARITY in oracle. I have to find the difference at row level.
Solution 1:[1]
Snowflake has EDITDISTANCE and SOUNDEX functions:
select editdistance('Duningham', 'Cunningham');
-- Result 2
select soundex('McArthur') = soundex('MacArthur');
-- Result TRUE
For EDITDISTANCE, unlike EDIT_DISTANCE_SIMILARITY lower scores are closer matches. There are many open source JavaScript implementations of fuzzy matching that you could plug into a Snowflake JavaScript UDF.
Solution 2:[2]
Interzoid (Disclaimer, I work there) has matching capabilities with native Snowflake connectivity, using knowledge bases (for different data types: name, company, address, etc.), heuristics, soundex, spelling analysis, derivatives, contextual ML, etc.) using a similarity key technology for use with one or more tables. It accesses an underlying API for each record in a table to generate the similarity keys (which can be appended to the table if desired) upon which the fuzzy matching is based -> https://connect.interzoid.com/matching-data-database - it would work on the above scenario.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Silver Bullet |