HtmlAcceptLanguage: Detecting the user’s preferred language - ΩJr. Software Articles and Products

This information lives on a web page hosted at the following web address: 'https://www.omegajunior.net/code/'.

Stop guessing the visitor's language by IP address. Stop insulting your visitors. Read the HTML_Accept_Language HTTP header instead. Here's why and how.

A.E.Veltstra
March 8, 2009

A word from a sponsor:


I live in the Netherlands. I happen to know several languages: Dutch (my native language), German, French, and English. I can kind-of read several more. When I visit a web site, I prefer articles in English. Especially when those articles concern technical subjects like computer or web site programming.

Some web sites have taken to offer their readers content in multiple languages, and done it the wrong way. They thought it a good idea to detect the reader's IP address, look up in what country that IP address might reside, and offer the reader a translation based on that country.

That method has a huge drawback. This article will describe that drawback, and present a method of detecting a preferred translation that should fit better.


The IP Address Method's Drawback


Suppose an IP address is found to reside in Belgium, what language are you going to offer the reader? Something you might not know: Belgium has 4 official languages: Dutch, French, English, and German. So what language will you offer?

Ah, you might say, we can simply offer them all, and show a language selector so the reader can choose.

But why should we force the reader to choose, when a preference is readily available in their browser?

Suppose an IP address is found to reside in the Netherlands, what language are you going to offer the reader? The IP address does not tell you whether the computer is located on, for instance, an international army base, being used by American or Canadian soldiers, who just might prefer to read English or French instead of Dutch. The IP address does not tell you whether the computer is being used by an immigrant who has yet to learn Dutch, and prefers to read the article in their own language.

Who are you to tell your readers what language they speak, if a preference is readily available in their browser?


A better method


Browsers incorporate a property named HTTP_ACCEPT_LANGUAGE. This property has been part of the HTTP protocol for years and was intended to request articles in a preferred language. Most browsers offer easily accessible preference panels, which can be used to set which languages are preferred while browsing. In most cases, this preference is unrelated to the language of the browser itself or the operating system.

One can set multiple languages and order them, with the most preferred one at the top. The browser then sends a list of the preferred languages to the web site, which can read the preference by querying the HTTP_ACCEPT_LANGUAGE server variable. The browser includes a sort order, showing which language is the most preferred, and which is least preferred. Detecting this order and the subsequent language preference is trivial in most web programming languages.


Augmenting the automated detection


Regardless of the chosen method, some web sites offer their readers a choice of languages. Personally I find it ridiculous to rebuild a pre-existing browser preference into a web site. It can have its advantages, though, especially if you want to brag about the amount of languages your site sports. Such methods usually employ a drop-down list of available languages, combined with browser-stored cookies to remember a visitor's choice.

Offering a method like this without also detecting the preferred language (using either method discussed) is a mistake: first you completely ignore the pre-existing preference, and then you ask the visitor to use your own method. You disregard both your visitor and the existing protocols. How arrogant are you?

Combining either of the previously discussed methods with a cookie-based preference currently seems the most user-friendly method, since it first detects the reader's preference, and then offers them the option to set an additional preference.


Arguments of managers to prefer the IP Address Method anyway



All the major players do it this way
No they don't. <a href="http://www.google.com/">Google</a> for instance detects the HTML_Accept_Language preference (if set; otherwise, they fall back on the IP Address Method).

It's easier
No it isn't. The two methods are equally easy / equally hard. The IP Address Method has an added dependency on a provider who connects IP addresses to physical locations. Besides, you aren't doing the programming. Your technicians are. They should know what they're doing.

We don't know it
This is why technical solutions should be created by technicians. This is why you pay your systems architect the big bucks: they should know what they're talking about. This is why this article was written.

Our IT department doesn't let us change this preference
Shouldn't the IT department also follow the manager's orders? Who's the boss in your company?

We've never had any complaints
I have. And I do some complaining myself. I happen to know people who are bothered by the IP Address Method. People who immigrated or are visiting or on holiday, or temporarily working in a foreign country. I've been one of those people.


Page Language Preferences in Various Browsers



Opera 9.6


Mozilla Firefox 3


MS Internet Explorer 7


Google Chrome



Conclusion


Knowing that the HTTP_ACCEPT_LANGUAGE preference has been a part of the HTTP protocol and web browsers for several decades, and knowing that readers can set their preferred language quite easily in all modern web browsers, ignoring the reader's preference strikes me as ignorant, arrogant and an outrage. Parsing the reader's preference is easy in most web programming languages, and should be the primary method of detecting which translation to offer.

Need problem solving?

Talk to me. Let's meet for coffee or over lunch. Mail me at “code at omegajunior dot net”.