Good way to provide Rails-way i18n support in Django - django

There's one thing in (new) Rails I envy: internationalization support (Django has one too, but I prefer Rails' flavour).
The key difference between Rails' and Django's approaches is what kind of string behaves like keys in key-value translation mapping, i.e.
Django version (keys - strings in "main" language, for example english):
msgid "Save and quit"
msgstr "Zapisz i wyjdź"
Rails version equivalent (keys - abstract strings; standalone unusable - one need to provide at least 1 "translation") - actually, Rails uses YAML format, but following example present the idea:
// english translation file
msgid "SAVE_QUIT_MESSAGE"
msgstr "Save and quit"
and
// polish translation file
msgid "SAVE_QUIT_MESSAGE"
msgstr "Zapisz i wyjdź"
Rails' way of supporting i18n is IMHO much better (think of key immutability - resistant to grammar/spelling corrections; language agnosticism etc).
One way to utilize this schema in Django would be to use some abstract language for the sole purpose of being translated (strings in that language would make immutable keys), but Django support only fixed set of languages. Another solution - sacrifice one of the supported (unused) languages to play this role - but this is just bad :P
Any ideas/third-party apps/techniques to solve this issue?
Sidenote: extending i18n support for artibrary languages would give funny opportunities:
// slang translation file
msgid "SAVE_QUIT_MESSAGE"
msgstr "Save shit 'n' quit, bro"

Step back for a minute or two. Your doing triple work here. First you have to come up with a UNIQUE_ID and then you force people to look up either the context from the code or another language file to figure out what would the proper message for AMBIGUOUS_ARGUMENT_PROVIDED would be until you get down to providing the actual translation. And who ever said that creating IDs that can meaningfully convey the context and provide good message hints was ever easy?
What your trying to do is some preposterous shit bro! Jokes aside, the reason gettext is the most prevalent and widely used i18n and l10n API is because each message gets a unique message catalog ID assigned from it's contents and because it's proven you'll have a way better time translating messages than providing translations for IDs, reminiscent of when everyone tried making their own key->value i18n framework because it was the most straightforward to design.
You'll eventually conclude that it was a bad idea to use gettext the way it wasn't meant to and you can save yourself right now by forgetting about the whole idea.

If you insist on doing it this way, then it can be done by generating a .po file that will contain the English translations of the source strings.

Related

l10n/i18n: how to handle phrases with dynamic list of items?

What's the sanest way to handle translation and localization of dynamic lists?
Let's say I've queried the database, and got a list ["Foos", "Bars", "Bazes"]. Let's also assume the list always contain at least two items - I'll be sure to use a different translation for the single-item case.
What should I do if I need a phrase like "We have a wide choice of Foos, Bars and Bazes in our code"? (assuming that list items are dynamic so I can't just pre-translate all the possible permutations, and need to do things at runtime.)
I see at least the following issues:
I need to inflect all the items to the correct form (are there languages where different forms are required depending on the position in the list?)
Different locales may have drastically different rules how to join items.
E.g. CJK locales need "、" instead of ",".
And AFAIK in Chinese there will be "及" or "和" - depending on the full phrase - before the last item, so I guess there's some ambiguity with translating "and".
And, as I've read, some languages may avoid punctuation like it's used in English, but have other concepts instead, e.g. Arabic translator may prefer use "و" before every item (although they also have commas, "،"). Not sure if true or not - I don't know Arabic, just saw it mentioned.
My problem is, I don't even know what tooling may help me here. I don't have any particular programming language requirements, although Python or JavaScript would be the best. But I guess I can run just about anything, as I can probably build a l10n microservice and query it from my project.
I've used GNU gettext before I've encountered this, but I haven't found anything that would help me in its APIs and data formats. The best I can imagine is _("We have a wide choice of %s in our code", list_text) and generate list_text using some DIY hacks. I'm not sure XLIFF format has anything like this as well. I've found i18n-list-generator on npm but it's way too simplicistic.
Have anyone dealt with something like this? What did you do? Is there any library out there that handles this - so I can take a look at its API and learn how it does things?
Here's how I would approach it:
No concatenation. All string joining needs to be done via format strings with placeholders.
Only use format strings that support named/numbered placeholders. E.g. {FOO} or $1 instead of %s (this is to allow for parameter reordering). Named placeholders are also better since they give more context to translators. Let's assume we're using {FOO}-style placeholders.
To render a list, I would use a couple of format strings, e.g.: joinItem = "{LIST}, {ITEM}" to append items to the list and joinLastItem = "{LIST} and {ITEM}" to append the last item. This will allow one to render strings like Foos, Bars and Bases, change punctuation and even reverse the ordering of the list, if necessary.
Finally, you can use the final format string, e.g. weHaveTheseItems = "We have a wide choice of {ITEMS} in our code", assuming the {ITEMS} gets replaced with the previously rendered string.
Shameless self-promotion: you may want to have a look at the Plurr library that supports such {FOO}-style placeholders, as well as plurals (something you will likely need for such messages). It supports JavaScript among other languages.
This is a pain, as you point out not all locales can be expected to support the ",,,,and" form.
Inspired by #GSerg and #Igor Afanasyev I came up with a GNU Gettext based solution like the following (pseudo gettext invocation):
GettextPlural(
// TRANSLATORS: For multiple "choices", each will be prefixed with a new-line (\n)
"We have a wide choice of {choices} in our code",
"In our code we have a wide choice of{choices}", choices.Count)
should print like:
"We have a wide choice of FOOs in our code"
"In our code we have a wide choice of
FOOs
BARs
BAZs"
Remember to stick the --add-comments=TRANSLATORS to your xgettext invocation.
For Web purposes you could use <ul><li>...</li>... </ul> or whatever instead of \n.
The benefit is that layout is at least as universal as UI layout, but you are still allowing non-English'ish locale plural forms.
Some languages have only one plural form so their translation must work with both a single choice and multiple choices, so in particular, they cannot have a conditional new-line.

REST/HATEOAS Microformat/FOAF/Schema Domain Specific Confusion

In a RESTful service, you can define links to resources as so:
<next xmlns="http://example.org/state-machine"
rel="http://mydomain.example.org/user"
url="http://mydomain.example.com/user/1234"
type="application/xml"/>
Or a JSON-LD Example:
{
"#context": {
"name": "http://xmlns.com/foaf/0.1/name",
"homepage": {
"#id": "http://xmlns.com/foaf/0.1/homepage",
"#type": "#id"
}
},
"name": "Manu Sporny",
"homepage": "http://manu.sporny.org/"
}
Or a vCard example:
<address id="hcard" class="vcard">
<p class="fn n">
<span class="given-name">First</span>
<span class="additional-name">M</span>.
<span class="family-name">Last</span>
With the rel attribute pointing to a schema/microformat/microdata/RDFa description of that free-offer object within the domain. Say I have a user object with fields name and homepage,
In the examples above, would rel="http://mydomain.example.org/user" be more appropriate since it is domain specific user or should I use something like this from foaf?
"name": "http://xmlns.com/foaf/0.1/name",#id": http://xmlns.com/foaf/0.1/homepage"
I am very confused with these RDFa, microformat, microdata, schema, vcards, hcards, foaf, http://www.productontology.org/id/, www.schema.org/name, http://rdf.data-vocabulary.org/#name, json schema, etc. When do I create my own microformat/schema, and when do I use the public ones defined in those different areas (vCards, hcards, foaf, productontology, schema.org)? I understand that RDFa, microformats, etc., are really public metadata about data, but where can I find a full list of them to use?
If I were to create my own rel like rel="http://mydomain.example.org/user", defining the user object, how should I document it? Is there a standard I can follow? Some places suggest human readable discovery documents, or maybe a JSON/XML schema at that location describing the contract and the version?
One suggestion is to have a different version of http://mydomain.example.com/v1/user/1234 to talk to with a different version of specification for the service so existing client won't break on version change.
Please help me map out this confusion, or the terms I should use, with regards to Microformat, RDF, Microdata, Schema, etc.
When just started learning RDF/Ontology, I’ve the same confusion.
There are a lot of vocabs from many different organizations. The key is to use the ones that are well established and adopted by others; but also more importantly, make sure that to use the ‘terms’ which represent the ‘semantic meanings’ that you intend to mean.
To answer your specifics questions:
1) You ‘can’ create your own terms but the point is that when other applications parse your terms, the application will need to know what your terms really means. If you use FOAF:homepage, all applications know its semantic meaning and what it refers to and also from that, it implies that whoever is the subject (subject in the rdf tripple) of FOAF:homepage is a FOAF:Person. There is where open linked data comes in.
http://linkeddata.org/
2) Myself does not have a lot of experience with RDFa/Microfromat, my understanding is that these technologies want to provide machine understandable semantic terms at the HTML layer. I think you may not want to create your own if there are existing ones but you can have your own terms defined in addition to what is out there.
3) You can define terms using OWL or RDFS
Just saw this, it may be useful..
https://code.google.com/p/tdwg-rdf/wiki/Beginners7OWL
But in my light way project, we just create terms by specifying the namespace, name of the term and what it mean.. You can follow the following to define your terms:
http://dublincore.org/documents/dcmi-terms/
JSON/XML/N3… are just different serialization formats and the same term can be serialized using many different formats. The important thing to remember is that you define the semantic meaning of the terms and they will be serialized to different formats. It is not that the format is not important but all formats point to the same term and same meaning.
4) Myself version my vocabs.. But I don’t have much insights into the best practices.
Hope this help!!
There is a well known expression of this in RDFa:
<html><body vocab="http://purl.org/dc/terms/">
<div typeof="foaf:Person">
<span property="creator">manu sporny</span>
<span property="foaf:age" content="66"/>
<span property="foaf:homepage" resource="http://x.com"/>
</div>
</body></html>
The "foaf" prefix is defaulted by RDFa. No context to define.
The "dc" prefix is made the default, no prefix required for "creator".
This is the standard, forget everything else.
The list of directly usable prefixes is provided by
http://www.w3.org/2011/rdfa-context/rdfa-1.1
You may find here a lot of usable concepts, in fact all the public schemas.
Note the "schema" prefix, for a vocabulary defined here:
http://schema.org/
It is specially rich, and widely used.
Note also that it tried to define a microformat to compete with RDFa notation. Drop that and use the vocabulary with RDFa.

is this a good use for django internationalisation

I am working on a django website/project, it has already been internationalised/localised to us-english, gb-english and mandarin.
It is deployed with same codebase except for the settings config which states what lang to use. Some deployments are mandarin only, others are us-english.
The client now has a requirement to change some of the language used within a gb-english version for a specific deployment. My main goal is not to duplicate things and I think I can get what I need out of django 18n.
Basically, I am looking to find if i can or should use django i18n to handle:
'Welcome' on deployA
'Oh hai' on deployB,
even though they're still both gb-english based sites, I feel I should be able to say that deployA will use 'en_GB' and deployB would use 'en_GB_special'.
I suppose it's the fact that I want to use a non-standard i18n name/code that is making me wonder if I should do this, or if I am approaching this in the wrong manner.
I would only create a new language if you're intending to maintain two translations. If the new site will need to stay in sync with en_GB and/or you intend to use the customization in another language then I think you'd be better off creating new messages, adding a string for them to en_GB and add a flag to your application to switch the feature for your feline client.

How to set language for django-pagination?

As you can see, django-pagination has polish (pl) translations - https://github.com/ericflo/django-pagination/tree/master/pagination/locale but I dont know, how to set polish language for django-pagination? (default english)
This should happen automatically.
Check your django settings if USE_I18N is set to True and if your LANGUAGE_CODE is set to pl.
For further information take a look at the django localization page. You can find a more detailed documentation of how the translation in django works here.
There's also a list of language codes, I guess pl should be correct.
You can either change the language setting of your browser, which will send the appropriate headers with each request and trigger the translation to be used, or you can provide a language setting selection so the user can choose their language.
You can roll your own code to provide this interface or use django-user-accounts.
You also might want to check that you have the appropriate middleware installed as described in this documenation.

Can i use an other language instead of english for default django translation

Can I use another language instead of english(say, french) for default django translation.
For example, instead for doing this:
messages.error(request, _('My message in english'))
I do this:
messages.error(request, _('Mon message en francais'))
Yeah, you could do that, and it would mostly work, but better would be to write them in English and then provide French translations (via the standard i18n approach), and to set the project's LANGUAGE_CODE to 'fr' as well.
That way, your code will be more easily reusable in other languages, and - perhaps more usefully to you if you're not worried about that - you'll be able to cleanly use French/other language translations already available in any third-party apps you want to add to your site, else you'll be mixing what Django thinks is default English (but is French) and thinks is French (and is French)
Allez! ;o)