'Validation code for IDN (Domain) with regex (regular expression)
I want to check domains with regex. My old code was:
/^([a-z0-9]+([-a-z0-9]*[a-z0-9]+)?.){0,}([a-z0-9]+([-a-z0-9]*[a-z0-9]+)?){1,63}(.[a-z0-9]{2,7})+$/i
It is okey but this code doesn't validate IDNs (internationalized domain names) such as öü.com or öü.öü
My domain format is:
- example.com
Besides, I don't want:
- www.example.com
- http://example.com
- http://www.example.com
Important note: user can add the domains;
- with 2 extension like example.co.uk
Solution 1:[1]
Idn's such as ????.icom.museum
use Punycode encoding, as defined in RFC 3492, before submission for DNS resolution.
It seems that you're using php, based on that, you should use the idn_to_ascii()
function to convert the idn's, ex:
echo idn_to_ascii("????.icom.museum");
//xn--h2brj9c.icom.museum
Solution 2:[2]
You can add support for IDNs by replacing a-z
by \pL
Solution 3:[3]
There are several reasons why one may need to validate a domain or an Internationalized Domain Name.
To accept only the functional domains which resolve when probed through a DNS query To accept the strings which can potentially act (get registered and subsequently resolved, or only for the sake of information) as domain name Depending on the nature of the need, the ways in which the domain name can be validated, differs a great deal.
For validating the domain names, only from pure technical specification point of view, regardless of it's resolvability vis-a-vis the DNS, is a slightly more complex problem than merely writing a Regex with certain number of Unicode classes.
There is a host of RFCs (5891,5892,5893,5894 and 5895) that together define, the structure of a valid domain ( IDN in specific, domain in general) name. It involves not only various Unicode Character classes, but also includes some context specific rules which need a full-fledged algorithm of their own. Typically, all the leading programming languages and frameworks provide a way to validate the domain names as per the latest IDNA Protocol i.e. IDNA 2008.
For validating the domain names, do refer to the very thoroughly research document produced by the "Universal Acceptance Steering Group" (https://uasg.tech/), titled,
- "UASG 018A UA Compliance of Some Programming Language Libraries and Frameworks (https://uasg.tech/download/uasg-018a-ua-compliance-of-some-programming-language-libraries-and-frameworks-en/ as well as
- "UASG 037 UA-Readiness of Some Programming Language Libraries and Frameworks EN" (https://uasg.tech/download/uasg-037-ua-readiness-of-some-programming-language-libraries-and-frameworks-en/).
Both the documents list various programming language libraries that can be used to validate the domain names.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Pedro Lobito |
Solution 2 | |
Solution 3 | ThinkTrans |