One Web, Many Languages: An Introduction To Internationalized Domain Names (IDNs)
By admin on May 15, 2007 in Doman Name News
To many in the English speaking world, we take our alphabet for granted. The Latin alphabet as used in English is relatively straightforward: 26 letters a through z. Conveniently, these 26 letters, the hyphen “-“ and Arabic numerals 0 through 9 constitute the acceptable characters for a domain name.
To much of the world that is not nearly as intuitive. While the Latin alphabet is the most widely used, 3 other alphabets are used in large portions of the world. The Cyrillic alphabet is spread through Russia, parts of Eastern Europe and former republics of the Soviet Union. The Arabic alphabet spans from Northern Africa through the Middle East and the Brahmic-derived alphabets of Southeast Asia. Throw in the accents, diacritics and ligatures seen in various languages using the Latin alphabet, and the possible combinations become staggering.
So how could that problem be addressed? The simplest solution would be to simply dictate that all domain names would consist of only 26 letters, ten numerals and the hyphen. However, that narrow view limits and complicates the accessibility of the Net to large swaths of the world’s population.
Enter Internationalized Domain Names. Introduced by Martin Duerst in 1996 and implemented in 1998, the system was eventually adopted (with additions and revisions) as the Internationalized Domain Names in Applications (IDNA) system. Within the IDNA system, an internationalized domain name is a name consisting of labels, which can successfully be translated by the ToASCII algorithm.
Internationalizing a domain name works like that. The ToASCII algorithm is applied individually to each label within a domain name. whether the ToASCII algorithm fails considering any label contains at least one non-ASCII character, soon after further steps are taken. The name is first "normalized" using the Nameprep algorithm. The normalized name is thereupon converted to ASCII via the Punycode algorithm. Finally, the four character ASCII Compatible Encoding (ACE) prefix "xn- -" is added. whether, for any reason, the ToASCII algorithm fails (i.e. the resulting string exceeds 63 characters) the name cannot be internationalized at that instance.
To "de-internationalize" a domain name, the ToUNICODE algorithm is applied, resulting in the originally entered domain name, except that any "normalization" will not be undone. The ToUNICODE algorithm will always succeed on a properly internationalized domain name considering it is simply "undoing" the work of the ToASCII algorithm.
In theory, the shift into and out of worldly domain names could occur seamlessly and invisible to the user. that is a useful feature for users but can additionally expose them to a dangerous spoof. In essence, the view behind the IDN spoof is to register a domain name visually very similar to a trademarked name, for example Paypal. Due to the visual similarity of the Latin "a" and Cyrillic "a" a domain name consisting of mixed alphabets can be registered and when presented as a link, (like that, http://www.pаypal.com/ where the first "a" is actually a Cyrillic "a") can easily idiot users into think they are at the genuine Paypal website. that, of course, would be a great opportunity for phishing scams - or bogus domain auctions.
This was foreseen and guidelines were issued to registries prior to implementing IDNs to address concerns of that spoof. Of course, not all registries fully embraced these guidelines and, as the link above shows, the spoff can be run today. that is now being addressed by browsers. Net Explorer 7 allows users to only decode selected languages for display in the address bar. Mozilla and Opera have chosen to display the Punycode version of the IDN unless the registry is on a "whitelist" of registries effectively implementing IDN anti-fraud guidelines (such as prohibiting the use of mixed character sets within a name.) Safari displays the Punycode translation of the domain name unless the setting in Preferences is altered to allow display of the decoded name.
So what will the impact of Internationalized Domain Names be on the Web as a whole? More fundamental to domainers, how does that impact opportunities in domain name investing, and is it already too late to get in on that?
We'll reply these questions, and more, in a future write-up.
Original post by domains

















You must be logged in to post a comment.