'Python : Extract mails from the string of filenames

I want to get the mail from the filenames. Here is a set of examples of filenames :

string1 = "[email protected]_2022-05-11T11_59_58+00_00.pdf"
string2 = "[email protected]_test.pdf"
string3 = "[email protected]"

I would like to split the filename by the parts. The first one would contain the email and the second one is the rest. So it should give for the string2 :

['[email protected]', '_test.pdf']

I try this regex function however it does not work for the second and third string.

email = re.search(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", string)

Thank you for your help



Solution 1:[1]

Given the samples you provided, you can do something like this:

import re

strings = ["[email protected]_2022-05-11T11_59_58+00_00.pdf",
           "[email protected]_test.pdf",
           "[email protected]"]

pattern = r'([^@]+@[\.A-Za-z]+)(.*)'

[re.findall(pattern, string)[0] for string in strings]

Output:

[('[email protected]', '_2022-05-11T11_59_58+00_00.pdf'),
 ('[email protected]', '_test.pdf'),
 ('[email protected]', '-fdsdfsd-saf.pdf')]
    

Mail pattern explanation ([^@]+@[\.A-Za-z]+):

  • [^@]+: any combination of characters except @
  • @: at
  • [\.A-Za-z]+: any combination of letters and dots

Rest pattern explanation (.*)

  • (.*): any combination of characters

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 lemon