how to handle multiple languages on website - cookies

I have a website that I am translating into different languages. I have the content translated and stored in a database. I also wrote, into the php files, different mechanisms that will display the language based on a global define I set high in the code. I am happy with all of this. My question is how do I control this global define?
I currently have a javascript toggle that sets a cookie and then reloads the current page. And every subsequent page just reads that cookie to set the global define. It works very well, however I am running into two big problems. (1) I can't just can't have a url to send to somebody that has the language in it (I could do something like domain.com/forwarder.php?lan=spanish&gotopage=page.php that would set a cookie and then forward, but that's ugly). And (2), search engines can't view the multiple languages since they don't really use cookies and javascript.
So how do I solve this? Does anybody have experience in this? Can you share your experiences?
I'm leaning towards just using the url and dropping the cookie; that seems popular among various international sites I've seen. So I'm guessing the urls would be:
domain.com/page (for english, equivalent to domain.com/en/page)
domain.com/es/page (for spanish)
domain.com/fr/page (for french)
etc ......
Is this a good idea? I will have to go through my code and prepend all my href's with the language code, which might be a pain.
So does anybody have any comments on this? Is this a good plan? Am I neglecting to realize something?

It's been a long time, but can't you use the $_SERVER["HTTP_ACCEPT_LANGUAGE"] and set it automatically. And prior to writing the cookie for the first time, leave message on the screen in either english or another language in the array asking if this is the correct language, with a drop down of available languages? Once it is selected, store that as default website language.
You can use string constants in global resource files. Have only one website that calls those string constants based on the current language.

Related

Web Design - Templates vs Include

I am currently developing a website. I would like to separate content and presentation. I am currently using a Dreamweaver Template to achieve this. However, I find that Dreamweaver's edit regions are very limiting in the design view. I have found that the same goal can be achieved by including the header and footer of my website.
What are the pros and cons of using includes rather than using templates?
First, if I were to rephrase your question, it's more like asking "Should I by a wire frame of a kite or by the glue to stick together what I'm making?" And then, you ask about the pros and cons of buying the wireframe against buying the glue. There are far too many variables as you can see...
And back on your your question... At some point your template will use include files. And for a start, it's worth knowing what you're thinking... Let's look at some basics.
Web design - usually refers to making websites that aren't really interactive. They don't have server-side elements. So most of the site has 'static' contents. If this were the case, you're better off with DreamWeaver, particularly if you're not into html/css editing.
Web development/programming - starts off with something as elementary as mailing a form, to highly interactive sites like FaceBook. Here you'll need to use some server-side language, usually like PHP, ASP or JSP. The choices are many but you've got to choose your own platform or combination of them.
Now to the second option (above). If for example, you were building a site using PHP, one of the nice things you'll do is to include your header, footer and side panels that need to be repeated across all pages. This way, you'll eliminate the need to re-write those sections. But if you were using a program like DreamWeaver, it does this duplication for you. Yes, it physically copy-pastes those sections into every file that needs it. Of course the end result may not be any different. But as a developer, you will be tied down to the DreamWeaver platform or for that matter, any other specific platform.
On the other hand, if you get used to working with an editor like NotePad++ or GEdit, you may switch between editors at any time. But you have the task of hand-coding everything from scratch. But then again, since you would use include files to bring in your headers and stuff, you save development time as well.
I don't know how much of html/css or php you know, but here's one of my demos to show you how to hand-code a site. This ain't complete but you should get an idea.
Link to the video introduction
Link to the video on youtube

A tool which checks that a local version of a site is fully translated (for continuous integration)

I'm working on a project, in which we design a localized version of an existing site (written in English) for another country (which is not English-speaking). And the business requirement is "no English text for all possible and impossible cases".
Does anyone know if there is a checker software/service which could check if a site is fully translated, that is which checks that there are no English text in it.
I new that there are sites for checking broken links, html validity etc, I need something like http://validator.w3.org/checklink but for checking that on all pages of the site there is no English text.
The reasons I think this way is needed are:
1. There is a lot of code which is common (both on backend and frontend) for all countries
2. If someone commits anything to the common code I need to be sure that this will not lead to english text issues in localized version.
3. From business point of view it is preferable that site does not support some functionality, than it shows english text ( legal matters)
4. The code both on frontend and backend changes a lot
5. There are a lot of files which affect text on the client's screen. Not just one with messages, unfortunately. And some of messages comes from backend, but most of them are in frontend
6. Due to all those fact currently someone manually fills all the forms and watch with his own eyes, and that is before each deploy...
I think you're approaching the problem from the wrong direction. You're looking for an algorithm or webcrawler that can detect wether any text is English or not? I don't know, but I doubt such a thing even exists.
If you have translated the website, you have full access to the codebase and/or translation texts, right? Can't you just open both the English and non-English strings files (.resx or whatever you are using) in a comparetool like Notepad++ to check the differences to see if there are any missing strings? And check the sourcecode and verify that all parts that can output user-displayable text use the meta:resourceKey property (or whatever you are using).
If you want to go the way of crawling, I'm not aware of an existing crawler that does this, but it sounds like a combination of two simple issues:
Finding existing open-source code for a web crawler should be dead simple
Identifying a language through n-gram analysis is trivial if there's a limited number of languages the text can be in.
The only difficult part would be to ensure that the analyzer always has a decent chunk of text to work with. You could extract stuff paragraph by paragraph. For forms you'd probably have to combine the text of several form labels.

What are my options for white-listing HTML in ColdFusion?

I want to allow my users to input HTML.
Requirements
Allow a specific set of HTML tags.
Preserve characters (do not encode ã into ã, for example)
Existing options
AntiSamy. Unfortunately AntiSamy encodes special characters and breaks requirement 2.
Native ColdFusion functions (HTMLCodeFormat() etc...) don't work as they encode HTML into entities, and thus fail requirement 1.
I found this set of functions somewhere, but I have no way of telling how secure this is: http://pastie.org/2072867
So what are my options? Are there existing libraries for this?
Portcullis works well for Cold Fusion for attack-specific issues. I've used a couple of other regex solutions I found on the web over time that have worked well, though they haven't been nearly as fleshed out. In 15 years (10 as a CMS developer) nothing I've built has been hacked....knock on wood.
When developing input fields of any type, it's good to look at the problem from different angles. You've got the UI side, which includes both usability and client-side validation. Yes, it can be bypassed, but javascript-based validation is quicker, more responsive, and rates higher on the magical UI scale than backend-interruption method or simply making things "disappear" without warning. It will speed up the back-end validation because it does the initial screening. So, it's not an "instead of" but an "in-addition to" type solution that can't be ignored.
Also on the UI front, giving your users a good quality editor also can make a huge difference in the process. My personal favorite is CKeditor simply because it's the only one that can handle Microsoft Word code on the front-side, keeping it far away from my DB. It seems silly, but Word HTML is valid, so it won't setoff any red flags....but on a moderately sized document it will quickly overload a DB field insert max, believe it or not. Not only will a good editor reduce the amount of silly HTML that comes in, but it will also just make things faster for the user....win/win.
I personally encode and decode my characters...it's always just worked well so I've never changed practice.

change the language of a sitecore item(tree)?

We have a small website which was developed with the default language ('en'), without paying mind to the language or versioning capabilities of sitecore (ouch!). We simply forgot to set the correct default language at the start of the project.
Now we have an entire content tree of 'en' items, when they should be 'nl-NL' (it's a Dutch site). And I am wondering if there is an easy way of changing the language for all items in that tree (that does not involve hacking).
I found this Q&A, but it just talks about setting the default language. I'd like to do that, yes, but I would also like to set the correct language for the existing item(versions).
thoughts?
From what I remember we had a similar problem before. Not with filling a website in the wrong language, but having empty content that should be filled with default english content after creating the new language. What we did was export the language. In your case you could export the English language, create a dutch language and replace all entries in the English XML file that comes out with nl-NL values.
After you've done that you could import the language file as the Dutch language and all items are filled.
To me this sounds as the easiest and quickest approach, since you only have to search and replace some xml tags.
Good luck!
You could write a .NET program that would go through your whole content tree and update language parameter of each item accordingly. Sitecore APIs give you access to almost everything you see in the backend (including content manipulation) so it shouldn't be much of a problem to automate this task.
As an anternative you could copy your whole content from one language to another and then remove the language you don't want. Here's how to do it.
I'll caveat that my experience with it is limited, but from what I've seen, the Sitecore Rocks plugin for Visual Studio might allow you to script this.
http://visualstudiogallery.msdn.microsoft.com/44a26c88-83a7-46f6-903c-5c59bcd3d35b

Web Application Cross Site Scripting

My website http://www.imayne.com seems to have this issue, verified by MacAfee. Can someone show me how to fix this? (Title)
It says this:
General Solution:
When accepting user input ensure that you are HTML encoding potentially malicious characters if you ever display the data back to the client.
Ensure that parameters and user input are sanitized by doing the following:
Remove < input and replace with "&lt";
Remove > input and replace with "&gt";
Remove ' input and replace with "&apos";
Remove " input and replace with "&#x22";
Remove ) input and replace with "&#x29";
Remove ( input and replace with "&#x28";
I cannot seem to show the actual code. This website is showing something else.
Im not a web dev but I can do a little. Im trying to be PCI compliant.
Let me both answer your question and give you some advice. Preventing XSS properly needs to be done by defining a white-list of acceptable values at the point of user input, not a black-black of disallowed values. This needs to happen first and foremost before you even begin thinking about encoding.
Once you get to encoding, use a library from your chosen framework, don't attempt character substitution yourself. There's more information about this here in OWASP Top 10 for .NET developers part 2: Cross-Site Scripting (XSS) (don't worry about it being .NET orientated, the concepts are consistent across all frameworks).
Now for some friendly advice: get some expert support ASAP. You've got a fundamentally obvious reflective XSS flaw in an e-commerce site and based on your comments on this page, this is not something you want to tackle on your own. The obvious nature of this flaw suggests you've quite likely got more obscure problems in the site as well. By your own admission, "you're a noob here" and you're not going to gain the competence required to sufficiently secure a website such as this overnight.
The type of changes you are describing are often accomplished in several languages via an HTML Encoding function. What is the site written in. If this is an ASP.NET site this article may help:
http://weblogs.asp.net/scottgu/archive/2010/04/06/new-lt-gt-syntax-for-html-encoding-output-in-asp-net-4-and-asp-net-mvc-2.aspx
In PHP use this function to wrap all text being output:
http://ch2.php.net/manual/en/function.htmlentities.php
Anyplace you see echo(...) or print(...) you can replace it with:
echo(htmlentities( $whateverWasHereOriginally, ENT_COMPAT));
Take a look at the examples section in the middle of the page for other guidance.
Follow those steps exactly, and you're good to go. The main thing is to ensure that you don't treat anything the user submits to you as code (HTML, SQL, Javascript, or otherwise). If you fail to properly clean up the inputs, you run the risk of script injection.
If you want to see a trivial example of this problem in action, search for
<span style="color:red">red</span>
on your site, and you'll see that the echoed search term is red.