'Python fuzzy matches of elements from two lists of song titles

I have two lists of titles for jazz standards. Both lists have a couple thousand entries, and I would like to find the songs in the second list that are not contained in the first and vice versa. The difficulty is that titles for the same songs have significant variations between them. Here are a few examples to illustrate the challenges:

AcCentTchuAteThePositive <=> Ac-cent-tchu-ate The Positive

CertainSmile <=> A Certain Smile

AintMisbehavin <=> Ain’t Misbehavin’

TenderTrap <=> (Love Is) The Tender Trap

AfricanFlower <=> African Flower (Petite Fleur Africaine)

Recapping the issues:

  • One of the lists does not use any spaces or punctuation and the other does
  • One of the lists may have titles in multiple languages
  • One of the lists does not use articles at the beginning of the title

Any advice would be appreciated!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source