Using flag to identify spoken language - web-services

In the webapp I am doing, I need to identify language people are speaking.
I wanted to use flag to do that. But I have some problems.
For example, if you speak French, you can put the French flag. But if you speak English you can put either the US or UK flag or a mix of both.
Which flag to choose for Arabic language ? Saudi Arabian flag ? Algeria ? Morocco ?

I think it's usual to use fragments of the language as a kind of graphic (text, instead of flags), for example:
english
français
русский язык
العربية
中文

The answer is to not use flags to identify languages. Not only there isn't a one-to-one mapping, and you won't cover all languages that way (Kurdish?), but some flags may be controversial (consider Taiwan flag for Traditional Chinese).

As many other answers stated, it's clearly a bad idea to use flags for languages.
See arguments here: Flag as a symbol of language - stupidity or insult?

Language and nationality are different terms, if your English translation is American English, you should use American flag, for British English use England flag and so on. There are lots of dialects in Arabic so which flag you should use depends on which language/dialect you use.

You know that the browser sends a list of locales that the user likes? And you can choose from them inside your webserver to select the one the person likes the most?
You can see here how the Debian project has solved this issue: http://www.debian.org/intro/cn

Related

Taiwanese language and country codes

I'm a bit uncertain between the two variations below:
zh-cht and zh-tw - it's for a site in traditional Chinese, mostly in Taiwan, but presence in Maccao and Hong Kong.
So zh-cht and zh-tw seem to represent the same language.
Possibly their are vernacular differences?
But zh-cht - seems to be an umbrella for the various vernacular differences?
If I try to compare to Spanish, it's difficult as it seems Spanish has less recent geopolitical upheavals.
I.e. es-co - is Spanish in Colombia but no one has to worry about whether we are speaking of "Grand Colombia - which would include Ecuador and Venezuela" that geopolitical issue is so far behind us, you know, they are now different countries officially and have been for a long time, so their's no issue so we all know es-co - refers to the country of Colombia and the fairly individual dialect they speak. No? Their is (googling this more) ES-419 which covers a range of Spanish's which is used to describe spanish of Latin America and the Carribean.
So how does this apply to zh-tw and zh-cht?
Is zh-cht the ES-419 of traditional Chinese?
In case it's useful:
zh-Hant is the correct code.
https://www.w3.org/International/articles/language-tags/
(Thank you andrewJames)

Localization in .rc files

My application is localized for multiple languages (write in VS2005 in c++).
What would happen if the application is run in a language whose localized files does not exist? For instance I have not localized for Dutch. what would happen when its run on a Dutch pc?
The load order is:
Primary language/sublanguage
Primary language
Language-neutral
English (skipped if primary language is English)
Any
(Taken from MSDN Blog).
So in your case you might end up with any of the languages you put in the ressources. If you want to influence the language taken, you can set the threads locale before loading a ressource. That's the way I did in programs: if locale is German, then keep it, otherwise change it to English so that international users always see the English GUI.

Match browsers set to Scandinavian languages based on "Accept-Language"

Question
I am trying to match browsers set to Scandinavian languages based on HTTP header "Accept-Language".
My regex is:
^(nb|nn|no|sv|se|da|dk).*
My question is if this is sufficient, and if anyone know about any other odd scandinavian (but "valid") language codes or obscure browser bugs causing false positives?
Used for
The regex is used for displaying a english link in the top of the Norwegian web pages (which is the primary language and the root of the domain and sub-domains) that takes you to the English web pages (secondary language and folder under root) when the browser language is not Scandinavian. The link can be closed / "opted-out" with hash stored in JavaScript localStorage if the user don't want to see the link again. We decided not to use IP geo-location because of limited time to implement.
Depending on the language you are working in there may be code in place you can use to parse this easily, e.g. this post: Parse Accept-Language header in Java <-- Also provides a good code example
Further - are you sure you want to limit your regex to the start of the string, as several lanaguages can be provided (the first is intended to be "I prefer x but also accept the following") : http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4
Otherwise your regex should work fine based on the what you were asking and here is a list of all browser language codes: http://www.metamodpro.com/browser-language-codes
I would also - in your shoes, make the "switch to X language" link easy to find for all users until they had opted not to see it again. I would expect many people may have a preference set by default in their browser but find a site actually using it to be unexpected i.e. a user experience like:
I prefer english but don't know enough to change this setting and have never had a reason to before as so few sites make use of it.
That regular expression is enough if you are testing each item in accept-language individually.
If not individually, there are 2 problems:
One of the expected languages could not appear at the beginning of the header, but after.
Some of the expected languages abbreviations could appear as qualifier of a completely different language.

Gettext/Django for german translations: formal/informal salutations

I maintain a pluggable Django app that contains translations. All strings in Python and HTML code are written in English. When translating the strings to German, I'm always fighting with the problem that German differentiates between formal and informal speech (see T–V distinction). Because the app is used on different sites, ranging from a social network to a banking website, I can't just support either the formal or informal version. And since the translations can differ quite a bit, there's no way I can parameterize it. E.g. the sentence "Do you want to log out?" would have these two translations:
Wollen Sie sich abmelden? (formal)
Willst du dich abmelden? (informal)
Is there anything in Gettext that could help me with this?
You can use contextual markers to give your translations additional context.
logout = pgettext('casual', 'Do you want to log out?')
...
logout = pgettext('formal', 'Do you want to log out?')
The best approach, used in other similar situations by gettext as well as UNIX is to use locale variants. For example, sr_RS is (or was, because Serbian is considered a metalanguage these days...) code used for Serbian written in Cyrillic. But it’s sometimes written in Latin script too, so sr_RS#latin is used as the language name and of course, the MO filename or directory as well.
Here, have a look at some translations I have present on my system:
$ find /usr/local/share/locale | grep /sr
/usr/local/share/locale/sr
/usr/local/share/locale/sr/LC_MESSAGES
/usr/local/share/locale/sr/LC_MESSAGES/bash.mo
/usr/local/share/locale/sr/LC_MESSAGES/bfd.mo
/usr/local/share/locale/sr/LC_MESSAGES/binutils.mo
/usr/local/share/locale/sr/LC_MESSAGES/gettext-runtime.mo
/usr/local/share/locale/sr/LC_MESSAGES/gettext-tools.mo
/usr/local/share/locale/sr/LC_MESSAGES/glib20.mo
/usr/local/share/locale/sr/LC_MESSAGES/wget.mo
/usr/local/share/locale/sr#ije
/usr/local/share/locale/sr#ije/LC_MESSAGES
/usr/local/share/locale/sr#ije/LC_MESSAGES/glib20.mo
/usr/local/share/locale/sr#latin
/usr/local/share/locale/sr#latin/LC_MESSAGES
/usr/local/share/locale/sr#latin/LC_MESSAGES/glib20.mo
/usr/local/share/locale/sr_RS
/usr/local/share/locale/sr_RS/LC_MESSAGES
/usr/local/share/locale/sr_RS/LC_MESSAGES/mkvtoolnix.mo
/usr/local/share/locale/sr_RS#latin
/usr/local/share/locale/sr_RS#latin/LC_MESSAGES
/usr/local/share/locale/sr_RS#latin/LC_MESSAGES/mkvtoolnix.mo
$
So the best way to handle German variants is the same: use de (or de_DE) for the base informal variant and have a separate translation file de_DE#formal with the formal variant of the translation.
This is basically what WordPress does too. Of course, being WordPress, they have their own special flavour and don’t use the variant syntax, but instead add a third component to the filename: de_DE.mo is the informal (and also fallback, because it lacks any further specification) variant and de_DE_formal.mo contains the formal variant.

Country and language code detecting

I need to detect user's language and country code in Qt. That codes must be matching standards at http://standards.freedesktop.org/desktop-entry-spec/latest/ar01s04.html.
I've tried QLocale, but it returned full country and language name in countryToString and languageToString. (I need short code, like "en" instead of "English".)
One of the ways is creating map of QLocale::Language and QString. But is there any faster and simpler way?
See QLocale::name()
Returns the language and country of this locale as a string of the
form "language_country", where language is a lowercase, two-letter ISO 639
language code, and country is an uppercase, two- or three-letter
ISO 3166 country code.
In addition to Paul's answer, there are QLocale::uiLanguages() and QLocale::bcp47Name() which should give variations.
When we talk about correct detection of country actually set in user preferences (Control Panel/Location on Windows, Preferences/Region on OS X), you should be using https://github.com/crystalidea/qt-detect-user-country