'Simple - using String#scan to extract an email address
I've got a string that contains:
@from = "John Doe <[email protected]>"
When I do:
@from.scan('/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i')
I get no results. I'm trying to extract the email address on it's own.
I tried removing the \b's but this did not work either.
Any help would be much appreciated.
Solution 1:[1]
Your expression works fine: rubular
The problem is the quotes around your regular expression means that it is interpreted as a plain text string rather than a regular expression. Removing the quotes solves the problem: ideone
@from = "John Doe <[email protected]>"
@from.scan(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i) { |x| puts x }
Output:
[email protected]
Solution 2:[2]
Sorry, I don't have enough rep to comment, so I'll make this an answer:
For any future use, everyone should make one modification: Don't restrict the TLD length to 4. New TLDs are being introduced very rapidly, you should now use a regex like this:
@from.scan(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/i)
All I did was remove the 4
at the end of the regex, which places a maximum length of 4 characters on the TLD. TLDs used to almost all be 2, 3, or 4 characters long (.com, .org, .info, etc.). But now, they are introducing tons of new ones (.auction, .software, .business, etc.)
So nobody should restrict TLD length anymore (although leaving a minimum of 2 chars is still good).
Solution 3:[3]
For those who only need to deal with addresses like "John Doe <[email protected]>"
, which may contain a display name.
Use the Ruby Mail::Address
class.
require 'mail'
#=> true
a = Mail::Address.new("John Doe <[email protected]>")
#=> #<Mail::Address:70264542184500 Address: |John Doe <[email protected]>| >
a.address
#=> "[email protected]"
a.display_name
#=> "John Doe"
Solution 4:[4]
Updated with the Regexp from: URI::MailTo::EMAIL_REGEXP would make a current (2022) version of this
@from.scan(/\b[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\b/) # => ["[email protected]"]
This will also work fine for multiple emails in a string, i.e:
"some text with [email protected] and [email protected].".scan(/\b[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\b/) # => ["[email protected]", "[email protected]"]
I couldn't see an easy way to convert URI::MailTo::EMAIL_REGEXP from having a \A \z at the start and the end to using \b \b at the start and the end. That would have been preferable and future proofed.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Kyletns |
Solution 3 | lulalala |
Solution 4 | Johan |