'Python fuzzy matches of elements from two lists of song titles
I have two lists of titles for jazz standards. Both lists have a couple thousand entries, and I would like to find the songs in the second list that are not contained in the first and vice versa. The difficulty is that titles for the same songs have significant variations between them. Here are a few examples to illustrate the challenges:
AcCentTchuAteThePositive <=> Ac-cent-tchu-ate The Positive
CertainSmile <=> A Certain Smile
AintMisbehavin <=> Ain’t Misbehavin’
TenderTrap <=> (Love Is) The Tender Trap
AfricanFlower <=> African Flower (Petite Fleur Africaine)
Recapping the issues:
- One of the lists does not use any spaces or punctuation and the other does
- One of the lists may have titles in multiple languages
- One of the lists does not use articles at the beginning of the title
Any advice would be appreciated!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|