How Do Search Engines See A Localized Django Site? - django

I have a Django site that uses the localization middleware in combination with gettext and the trans/blocktrans template tags to show visitors different pages depending on the preferred language in their user agent string (which seems to be the standard way of doing things in Django).
This works great for supported languages (currently only Spanish, English, and German with more coming). If I set the preferred language in my browser to a different language, I get the pages for that translation. However, I have no idea how it appears for search engines.
When a search engine crawls a site, does it typically have a preferred language in its agent string? Will German spiders get the German site and will Spanish ones get the Spanish site, or will they just get the default English site that's displayed when a user has no language set? Does this vary by search engines and is there a "standard way" of doing things that individual crawlers may or may not stick to?

bots typically do not have accept-language setting in the http header. which means that django will service your default language.
Regional search engines can have bots with accept-language set to whatever they prefer, but you cannot rely on that.
It is best to have different pages for each language. such as http://your.website.com/english/
and then in your middleware set up a redirect to the right language page if a specific accept-language is present.

Don't rely on what the search engine may do in this regard. You want all versions to be crawled. To achieve that:
Have different URLs for the different language versions.
Make sure the search engines can find the different versions.
Overall, I believe that the way I did it on my homepage is close to ideal in regard to both search engines and regular users:
When a user arrives at, e.g. brazzy.de/index.php, the site tries to determine the language from cookie (if present) or browser settings (Accept-language header), defaults to English, and does not redirect
Every page has links to the different language versions of that page (IMO the most important factor for user convenience, and also makes sure search engines can easily find the different versions).
These links lead to e.g. brazzy.de/en/index.php, which is in my case rewritten to brazzy.de/index.php?lang=en - this ensures that search engines see distinct URLs for the different language versions.
Visiting such a subdirectory sets the language cookie to that language
The pages without a language-specific URL (i.e. where the language depends on client data) use e.g. <link rel="canonical" href="/en/"> to tell the search engine at which language-specific URL that page can be found.
Use XML sitemaps to further make sure search engines can find all pages and all different language versions.

use hreflang meta tag but make sure you use different urls for different languages. even better, use different domain extension (example.de, example.es) in conjunction with Django sites framework.

Related

LocaleMiddleware not selecting requested available language

I am having trouble making django.middleware.locale.LocaleMiddleware set Chinese language as per the cookies/header I specify in the request.
After some debugging, I have narrowed it down to the the following function, which rejects it
django.utils.translations.trans_real.check_for_language
all_locale_paths returns only django's locales, which do not contain 'cn'. My apps are packaged and installed separately from the project itself, and they provide their own 'cn' language files, which get discovered successfully, but since their locale directories are not specified in LOCALE_PATHS, the middleware does not check them.
What is the best approach to avoid this problem? I am not adding the LOCALE_PATHS, as the app locations differ based on the different environments the project is deployed. I could import the app, and find the paths from it, but that seems like an overkill.
Django uses ISO-639 language codes (not country codes), which is also what browsers use in their Accept-Language headers.
The language code for Chinese is "zh", in Django both variants "zh_Hans" and "zh_Hant" are supported. "cn" isn't a language code for Chinese.

How to rename Liferay's default cookies?

I have JSP project which uses Liferay framework. There are default Liferay cookies named COOKIE_SUPPORT and GUEST_LANGUAGE_ID in Liferay. I dont want hackers to view any of my technology information by any means. How can I rename these cookie?
If you want to protect the framework you're using, you won't have to worry about the names of the cookies. Worry about server identification, elements of the DOM, structure and mechanics of URLs, secure&hardened setup of your server, common translations, default content, standard error messages, etc.
In other words: If you don't want to give away, which standard framework you're using (and this is not limited to Liferay) you'll have to roll your own. Good luck with getting this as powerful and as well tested as any standard framework.
Rather worry about keeping your systems updated all the time and protect from well known vulnerabilities in older systems. For hardening Liferay specifically, you might want to start with my blog series on securing Liferay (linking chapter 1 which refers to the other chapters)
Promoting a comment into this answer: One way to find out how to change them is to search for their names in the source code and identify the kind of plugin you need to provide different values - most likely this will be an ext-plugin. After all, Liferay's source is available. I don't see anything short of this.

what is Haystack for Django?

I have been reading about Haystack,Whoosh,Xapian,etc. however I didn't really get what they are exactly used for and what is the relationship between them.
For example, it is said that
Enable searching on third-party apps without touching that app’s code.
Can some explain to me what these are used for maybe giving a nice link and simple enough to understand for a begginer.
thx
Haystack is a different beast from Whoosh/Xapian/etc.:
Haystack provides modular search for Django. It features a unified, familiar API that allows you to plug in different search backends (such as Solr, Whoosh, Xapian, etc.) without having to modify your code.
From the FAQ (emphasis added):
What is Haystack?
Haystack is meant to be a portable interface to a search engine of your choice. Some might call it a search framework, an abstraction layer or what have you. The idea is that you write your search code once and should be able to freely switch between backends as your situation necessitates.
The "search backends" mentioned are search libraries which have their own API. Haystack provides a unified API on top (and independent) of any one specific search library.

how to handle multiple languages on website

I have a website that I am translating into different languages. I have the content translated and stored in a database. I also wrote, into the php files, different mechanisms that will display the language based on a global define I set high in the code. I am happy with all of this. My question is how do I control this global define?
I currently have a javascript toggle that sets a cookie and then reloads the current page. And every subsequent page just reads that cookie to set the global define. It works very well, however I am running into two big problems. (1) I can't just can't have a url to send to somebody that has the language in it (I could do something like domain.com/forwarder.php?lan=spanish&gotopage=page.php that would set a cookie and then forward, but that's ugly). And (2), search engines can't view the multiple languages since they don't really use cookies and javascript.
So how do I solve this? Does anybody have experience in this? Can you share your experiences?
I'm leaning towards just using the url and dropping the cookie; that seems popular among various international sites I've seen. So I'm guessing the urls would be:
domain.com/page (for english, equivalent to domain.com/en/page)
domain.com/es/page (for spanish)
domain.com/fr/page (for french)
etc ......
Is this a good idea? I will have to go through my code and prepend all my href's with the language code, which might be a pain.
So does anybody have any comments on this? Is this a good plan? Am I neglecting to realize something?
It's been a long time, but can't you use the $_SERVER["HTTP_ACCEPT_LANGUAGE"] and set it automatically. And prior to writing the cookie for the first time, leave message on the screen in either english or another language in the array asking if this is the correct language, with a drop down of available languages? Once it is selected, store that as default website language.
You can use string constants in global resource files. Have only one website that calls those string constants based on the current language.

Django i18n and SEO

how do you prepare i18n in your websites? I mean what do you do avoid the situation when you search for i18ned websites in Polish you get English description cause English is the default one.
Thanks in advance,
Etam.
I give every language version it's own URL. So English version of an article would be avialiable under http://example.com/en/my-article, and a version in Polish under http://example.com/pl/my-article (or if you really care about SEO even under http://example.com/pl/moj-artykul).
Had I given all version the same URL (and switched content dynamically) Google would have indexed only one version, and users couldn't find the article using keywords from any other language. I also think having distinct URLs people can link to for every version is more user friendly.