I'm developing a travel blog for a client where they can post blogs and allow user comment. All works well except when a user is using an iPhone to post. All apostrophes are not recognized and are stored as a rectangle. I have tested on a Macbook and apostrophes come through as well as any other keyboard. It seems to just be iPhone that's the problem. I'm using UTF-8. Any idea what I can use in a ColdFusion Replace() to fix this issue before the user input is stored to the database?
Related
I'm working on a project, in which we design a localized version of an existing site (written in English) for another country (which is not English-speaking). And the business requirement is "no English text for all possible and impossible cases".
Does anyone know if there is a checker software/service which could check if a site is fully translated, that is which checks that there are no English text in it.
I new that there are sites for checking broken links, html validity etc, I need something like http://validator.w3.org/checklink but for checking that on all pages of the site there is no English text.
The reasons I think this way is needed are:
1. There is a lot of code which is common (both on backend and frontend) for all countries
2. If someone commits anything to the common code I need to be sure that this will not lead to english text issues in localized version.
3. From business point of view it is preferable that site does not support some functionality, than it shows english text ( legal matters)
4. The code both on frontend and backend changes a lot
5. There are a lot of files which affect text on the client's screen. Not just one with messages, unfortunately. And some of messages comes from backend, but most of them are in frontend
6. Due to all those fact currently someone manually fills all the forms and watch with his own eyes, and that is before each deploy...
I think you're approaching the problem from the wrong direction. You're looking for an algorithm or webcrawler that can detect wether any text is English or not? I don't know, but I doubt such a thing even exists.
If you have translated the website, you have full access to the codebase and/or translation texts, right? Can't you just open both the English and non-English strings files (.resx or whatever you are using) in a comparetool like Notepad++ to check the differences to see if there are any missing strings? And check the sourcecode and verify that all parts that can output user-displayable text use the meta:resourceKey property (or whatever you are using).
If you want to go the way of crawling, I'm not aware of an existing crawler that does this, but it sounds like a combination of two simple issues:
Finding existing open-source code for a web crawler should be dead simple
Identifying a language through n-gram analysis is trivial if there's a limited number of languages the text can be in.
The only difficult part would be to ensure that the analyzer always has a decent chunk of text to work with. You could extract stuff paragraph by paragraph. For forms you'd probably have to combine the text of several form labels.
I want to allow my users to input HTML.
Requirements
Allow a specific set of HTML tags.
Preserve characters (do not encode ã into ã, for example)
Existing options
AntiSamy. Unfortunately AntiSamy encodes special characters and breaks requirement 2.
Native ColdFusion functions (HTMLCodeFormat() etc...) don't work as they encode HTML into entities, and thus fail requirement 1.
I found this set of functions somewhere, but I have no way of telling how secure this is: http://pastie.org/2072867
So what are my options? Are there existing libraries for this?
Portcullis works well for Cold Fusion for attack-specific issues. I've used a couple of other regex solutions I found on the web over time that have worked well, though they haven't been nearly as fleshed out. In 15 years (10 as a CMS developer) nothing I've built has been hacked....knock on wood.
When developing input fields of any type, it's good to look at the problem from different angles. You've got the UI side, which includes both usability and client-side validation. Yes, it can be bypassed, but javascript-based validation is quicker, more responsive, and rates higher on the magical UI scale than backend-interruption method or simply making things "disappear" without warning. It will speed up the back-end validation because it does the initial screening. So, it's not an "instead of" but an "in-addition to" type solution that can't be ignored.
Also on the UI front, giving your users a good quality editor also can make a huge difference in the process. My personal favorite is CKeditor simply because it's the only one that can handle Microsoft Word code on the front-side, keeping it far away from my DB. It seems silly, but Word HTML is valid, so it won't setoff any red flags....but on a moderately sized document it will quickly overload a DB field insert max, believe it or not. Not only will a good editor reduce the amount of silly HTML that comes in, but it will also just make things faster for the user....win/win.
I personally encode and decode my characters...it's always just worked well so I've never changed practice.
My website http://www.imayne.com seems to have this issue, verified by MacAfee. Can someone show me how to fix this? (Title)
It says this:
General Solution:
When accepting user input ensure that you are HTML encoding potentially malicious characters if you ever display the data back to the client.
Ensure that parameters and user input are sanitized by doing the following:
Remove < input and replace with "<";
Remove > input and replace with ">";
Remove ' input and replace with "&apos";
Remove " input and replace with """;
Remove ) input and replace with ")";
Remove ( input and replace with "(";
I cannot seem to show the actual code. This website is showing something else.
Im not a web dev but I can do a little. Im trying to be PCI compliant.
Let me both answer your question and give you some advice. Preventing XSS properly needs to be done by defining a white-list of acceptable values at the point of user input, not a black-black of disallowed values. This needs to happen first and foremost before you even begin thinking about encoding.
Once you get to encoding, use a library from your chosen framework, don't attempt character substitution yourself. There's more information about this here in OWASP Top 10 for .NET developers part 2: Cross-Site Scripting (XSS) (don't worry about it being .NET orientated, the concepts are consistent across all frameworks).
Now for some friendly advice: get some expert support ASAP. You've got a fundamentally obvious reflective XSS flaw in an e-commerce site and based on your comments on this page, this is not something you want to tackle on your own. The obvious nature of this flaw suggests you've quite likely got more obscure problems in the site as well. By your own admission, "you're a noob here" and you're not going to gain the competence required to sufficiently secure a website such as this overnight.
The type of changes you are describing are often accomplished in several languages via an HTML Encoding function. What is the site written in. If this is an ASP.NET site this article may help:
http://weblogs.asp.net/scottgu/archive/2010/04/06/new-lt-gt-syntax-for-html-encoding-output-in-asp-net-4-and-asp-net-mvc-2.aspx
In PHP use this function to wrap all text being output:
http://ch2.php.net/manual/en/function.htmlentities.php
Anyplace you see echo(...) or print(...) you can replace it with:
echo(htmlentities( $whateverWasHereOriginally, ENT_COMPAT));
Take a look at the examples section in the middle of the page for other guidance.
Follow those steps exactly, and you're good to go. The main thing is to ensure that you don't treat anything the user submits to you as code (HTML, SQL, Javascript, or otherwise). If you fail to properly clean up the inputs, you run the risk of script injection.
If you want to see a trivial example of this problem in action, search for
<span style="color:red">red</span>
on your site, and you'll see that the echoed search term is red.
I have searched and there are some php or flash based solutions, but none in Python.
PIL (Python Image Library) could be the starting point, but if anybody knows something done or half done I will gladly use it for my Django projects, finish or polish the library and release it for everyone.
It's actually fairly easy in Django. Jacob Kaplan-Moss posted about it years ago. I used this on my site for a while.
Apparently there's a django-rendertext project based on that code, but I haven't used it.
The nicest text comes from PyCairo, but having said that, I've not seen any project that just spits out images with the desired text in that manner, for any web framework.
Since
In October 2009, the Internet
Corporation for Assigned Names and
Numbers (ICANN) approved the creation
of country code top-level domains
(ccTLDs) in the Internet that use the
IDNA standard for native language
scripts.
I'm pretty sure that the standard regexes most sites currently use won't mark these as valid, or am I wrong? Has anyone actually thought about how this would play out or has anyone done anything about it?
Hope I'm not jumping the gun here.
When a user types an internationalized domain into a browser, it's translated to an ASCII form; e-mail, surely, must work the same way (however, I've never received mail from an IDNA domain and I have reason to believe browsers are the only implementors of it).
Mailing agents would have to know that when they see Unicode in an address, it must be translated to IDNA form, and then the MX records looked up. I don't think in all of my system administration I've ever accounted for this. Being able to accept something the browser will translate as IDNA in a form element is not something I know how to do. If it is indeed translated to IDNA and a regex attempts to validate it, it should work.
I wouldn't be surprised if an international domain fails most e-mail regular expressions, and I think the relevance of such a fail is less than 1%. IDNA is really an "address bar" system, and an awful hack; I would really be surprised if e-mail worked on top of it.
Everyone is freaking out like something is changing. It isn't. IDNA is just moving from the domain to the TLD, and business will be as usual like it was before. Don't overthink it, OP.
Old regexes will mark IDNA names valid, provided they are correctly translated into ASCII DNS names.
So yes, we have a problem here. One cannot expect a user to simply input unicode into a textarea and receive an ASCII version of the domain name on the server side.
IDNA encoding is not nice, nor easy: Unicode chars are removed for the word they are in and placed after it, with a position marker.
Reimplementing it (e.g.) in javascript is slow, sad and boring. An url-encode-like approach would have made porting over every language easier.
Also people with systems not supporting IDNA have an hard time figuring out what a given domain looks like in ASCII by hand.
I feel IDNA came out pretty ugly, and that will hinder its adoption.