'IDN (Domain) validation with regular expression
I want to check some domains with regular expression (regex). My old code was:
/^([a-z0-9]+([-a-z0-9]*[a-z0-9]+)?.){0,}([a-z0-9]+([-a-z0-9]*[a-z0-9]+)?){1,63}(.[a-z0-9]{2,7})+$/i
It works but this code doesn't validate IDNs (internationalized domain names) such as öü.com or öü.öü and doesn't check "www", "http://" terms.
My domain format is:
- example.com
Besides, I don't want:
- www.example.com
- http://example.com
- http://www.example.com
Important note: user can add the domains;
- with 2 extension like example.co.uk
Solution 1:[1]
"There are several reasons why one may need to validate a domain or an Internationalized Domain Name.
To accept only the functional domains which resolve when probed through a DNS query To accept the strings which can potentially act (get registered and subsequently resolved, or only for the sake of information) as domain name Depending on the nature of the need, the ways in which the domain name can be validated, differs a great deal.
For validating the domain names, only from pure technical specification point of view, regardless of it's resolvability vis-a-vis the DNS, is a slightly more complex problem than merely writing a Regex with certain number of Unicode classes.
There is a host of RFCs (5891,5892,5893,5894 and 5895) that together define, the structure of a valid domain ( IDN in specific, domain in general) name. It involves not only various Unicode Character classes, but also includes some context specific rules which need a full-fledged algorithm of their own. Typically, all the leading programming languages and frameworks provide a way to validate the domain names as per the latest IDNA Protocol i.e. IDNA 2008.
For validating the domain names, do refer to the very thoroughly research document produced by the "Universal Acceptance Steering Group" (https://uasg.tech/), titled, "UASG 018A UA Compliance of Some Programming Language Libraries and Frameworks (https://uasg.tech/download/uasg-018a-ua-compliance-of-some-programming-language-libraries-and-frameworks-en/ as well as "UASG 037 UA-Readiness of Some Programming Language Libraries and Frameworks EN" (https://uasg.tech/download/uasg-037-ua-readiness-of-some-programming-language-libraries-and-frameworks-en/). Both the documents list various programming language libraries that can be used to validate the domain names.
In addition, those who are interested to know the overall process, challenges and issues one may come across while implementing the Internationalized Email Solution, one can also go through the following RFCs: RFC 6530 (Overview and Framework for Internationalized Email), RFC 6531 (SMTP Extension for Internationalized Email), RFC 6532 (Internationalized Email Headers), RFC 6533 (Internationalized Delivery Status and Disposition Notifications), RFC 6855 (IMAP Support for UTF-8), RFC 6856 (Post Office Protocol Version 3 (POP3) Support for UTF-8), RFC 6857 (Post-Delivery Message Downgrading for Internationalized Email Messages), RFC 6858 (Simplified POP and IMAP Downgrading for Internationalized Email).)."
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | ThinkTrans |