How can I perform sanitization on string attributes to prevent XSS? Right now my thoughts are to override my base model's save method and iterate over all the strings in the model and set all the string inputs to safe strings. Would this be a good way to approach this problem or is there a better way?
EDIT:
Problem occurs when saving a name attribute ( alert('xss')) for a person in the app. It saves it in a non-sanitized manner into the database. Then that name is loaded in our other site which does not sanitize the output and that's where the script injection occurs! I'd like to sanitize it before saving it to the DB
Handlebars automatically sanitizes strings. If you want to avoid this, you must explicitly use the triple-brace syntax:
{{{myHtmlString}}}
Rather than trying to sanitise the input, you really ought to change that other site to make sure it html-escapes the data it is presenting from the database. Even if you would "sanitise" things on the Ember side, can you guarantee there are no other vulnerabilities which allow someone to inject HTML in the database?
Always escaping anything being presented is really the only safe way to deal with XSS. If you're filtering input you are very likely to not catch every possible way of injecting unexpected input.
Related
I'm trying to find something that will return an exception upon finding anything that even remotely looks like HTML or Javascript. I've figured out how to do it for individual views, but it's not a scalable solution, and ultimately I need to prevent code from being saved to the database no matter what view gets targeted by the injection attack.
Here is the functionality I'm looking for.
ILLEGAL_CHARS = '<>[]{}():;,'.split()
# bunch of code in between
for value in [company_name, url, status, information, lt_type, company_source]:
if any(char in value for char in ILLEGAL_CHARS):
raise Exception(f"You passed one of several illegal characters: {ILLEGAL_CHARS}")
I'm using django rest framework so I have to handle it on the backend. Thanks.
actually you don't nead to sanitize any user input because when you show them int the template the jinja {{object}} will make sure that no html or java script will be executed until you mark them as safe {{object|safe}} but if you want want not to save them in database that might help Sanitizing HTML in submitted form data
I'd like to ask you if you can briefly and in plain English explain to me
how cleaned_data() function validates data ?
Reason for asking is that I'm designing a web app powered by Django
and initially I thought cleaned_data() is smart enough to block user's input that contains potentially harmful characters. Such as ' ; < > and alike. Characters that can be used for SQL injection attacks.
To my surprise, when I deliberately slipped few of those characters into form field, the input made it to database. I was quite shocked.
So then ... what the cleaned_data() function is good for ?
I read about this function in docs, however I couldn't find necessarily answer to this.
cleaned_data is for validated form data. If you have a required CharField, for example, it will validate whether it is present, and whether it has enough characters. If you have an EmailField, then it will validate that it includes an email address.
Take a look at some of the build in form fields for a better idea of what you can do.
It is not intended to prevent XSS or SQL injection. It simply confirms that your form follows basic rules that you have set for it.
You missunderstood cleaned_data. The simplest definition of cleaned_data is something like:
A dict that contains data entered by the user after various validation
(built-in or custom)
Now, that being said, to understand every steps to form validation refer to this link (re-inventing the wheel would be silly since it is greatly explained.)
As for the SQL injection, this is another problem. But again, Django as a built-in way of handling it, this is from the documentation:
By using Django’s querysets, the resulting SQL will be properly
escaped by the underlying database driver. However, Django also gives
developers power to write raw queries or execute custom sql. These
capabilities should be used sparingly and you should always be careful
to properly escape any parameters that the user can control. In
addition, you should exercise caution when using extra() and RawSQL..
I can totally see your confusion, but remember that they are two different things.
I'm helping develop a new API for an existing database.
I'm using Python 2.7.3, Django 1.5 and the django-rest-framework 2.2.4 with PostgreSQL 9.1
I need/want good documentation for the API, but I'm shorthanded and I hate writing/maintaining documentation (one of my many flaws).
I need to allow consumers of the API to add new "POS" (points of sale) locations. In the Postgres database, there is a foreign key from pos to pos_location_type. So, here is a simplified table structure.
pos_location_type(
id serial,
description text not null
);
pos(
id serial,
pos_name text not null,
pos_location_type_id int not null references pos_location_type(id)
);
So, to allow them to POST a new pos, they will need to give me a "pos_name" an a valid pos_location_type. So, I've been reading about this stuff all weekend. Lots of debates out there.
How is my API consumers going to know what a pos_location_type is? Or what value to pass here?
It seems like I need to tell them where to get a valid list of pos_locations. Something like:
GET /pos_location/
As a quick note, examples of pos_location_type descriptions might be: ('school', 'park', 'office').
I really like the "Browseability" of of the Django REST Framework, but, it doesn't seem to address this type of thing, and I actually had a very nice chat on IRC with Tom Christie earlier today, and he didn't really have an answer on what to do here (or maybe I never made my question clear).
I've looked at Swagger, and that's a very cool/interesting project, but take a look at their "pet" resource on their demo here. Notice it is pretty similar to what I need to do. To add a new pet, you need to pass a category, which they define as class Category(id: long, name: string). How is the consumer suppose to know what to pass here? What's a valid id? or name?
In Django rest framework, I can define/override what is returned in the OPTION call. I guess I could come up with my own little "system" here and return some information like:
pos-location-url: '/pos_location/'
in the generic form, it would be: {resource}-url: '/path/to/resource_list'
and that would sort of work for the documentation side, but I'm not sure if that's really a nice solution programmatically. What if I change the resources location. That would mean that my consumers would need to programmatically make and OPTIONS call for the resource to figure out all of the relations. Maybe not a bad thing, but feels like a little weird.
So, how do people handle this kind of thing?
Final notes: I get the fact that I don't really want a "leaking" abstaction here and have my database peaking thru the API layer, but the fact remains that there is a foreign_key constraint on this existing database and any insert that doesn't have a valid pos_location_type_id is raising an error.
Also, I'm not trying to open up the URI vs. ID debate. Whether the user has to use the pos_location_type_id int value or a URI doesn't matter for this discussion. In either case, they have no idea what to send me.
I've worked with this kind of stuff in the past. I think there is two ways of approaching this problem, the first you already said it, allow an endpoint for users of the API to know what is the id-like value of the pos_location_type. Many API's do this because a person developing from your API is gonna have to read your documentation and will know where to get the pos_location_type values from. End-users should not worry about this, because they will have an interface showing probably a dropdown list of text values.
On the other hand, the way I've also worked this, not very RESTful-like. Let's suppose you have a location in New York, and the POST could be something like:
POST /pos/new_york/
You can handle /pos/(location_name)/ by normalizing the text, then just search on the database for the value or some similarity, if place does not exist then you just create a new one. That in case users can add new places, if not, then the user would have to know what fixed places exist, which again is the first situation we are in.
that way you can avoid pos_location_type in the request data, you could programatically map it to a valid ID.
I've found myself unsatisfied with Django's ability to render JSON data. If I use built in serializes then database foreign key relationships are not included in the data (only the keys). Also, it seems to be impossible to include custom data in the json feed that isn't part of the model being serialized.
As a test I implemented a template that rendered some JSON for the resultset of a particular model. I was able to include/exclude whatever parts of the model I wanted and was able to include custom data as well.
The test seemed to work well and wasn't slower than the recommended serialization methods.
Are there any pitfalls to this using this method of serialization?
While it's hard to say definitively whether this method has any pitfalls, it's the method we use in production as you control everything that is serialized, even if the underlying model is changed. We've been running a high traffic application in for almost two years using this method.
Hope this helps.
One problem might be escaping metacharacters like ". Django's template system automatically escapes dangerous characters, but it's set up to do that for HTML. You should look up exactly what the template escaping does, and compare that to what's dangerous in JSON. Otherwise, you could cause XSS problems.
You could think about constructing a data structure of dicts and lists, and then running a JSON serializer on that, rather than directly on your database model.
I don't understand why you see the choice as being either 'use Django serializers' or 'write JSON in templates'. The middle way, which to my mind is much more robust and fits your use case well, is to build up your data as Python lists/dictionaries and then simply use simplejson.dumps() to convert it to a JSON string.
We use this method to get custom JSON format consumed by datatables.net
It was the easiest method we find to accomplish this task and it looks very fine with no problems so far.
You can find details here: http://datatables.net/development/server-side/django
So far, generating JSON from templates, we've run into the need to escape newlines. Looking at doing simplejson.dumps() next.
i'm developing a site that must be as accessible as possible. While assigning the accesskeys to my form fields with
widget=FieldWidget(attrs={'accesskey':'A'})
i found out that the w3c validator won't validate an xhtml strict page with an accesskey in a select tag. Anyway i couldn't find a way to assign an accesskey to the label related to the select field (the right way to make the select accessible). Is there a way to do so?
Thanks
Interesting question. HTML 4.01 also prohibits accesskey in a select.
I believe the Short Answer is: Not in standard Django.
Much longer answer: I looked at the code in django/forms/fields.py and .../widgets.py and the label is handled strictly as a string (forced to smart_unicode()). Four possible solutions come to mind, the first three are not pretty:
Ignore the validation failure. I hate doing this, but sometimes it's a necessary kludge. Most browsers are much looser than the DTDs in what they allow. If you can get the accesskey to work even when it's technically in the wrong place, that might be the simplest way to go.
Catch the output of the template and do some sort of ugly search-and-replace. (Blech!)
Add new functionality to the widgets/forms code by MonkeyPatching it. MonkeyPatch django.forms.fields.Field to catch and save a new arg (label_attrs?). MonkeyPatch the label_tag() method of forms.forms.BoundField to deal with the new widget.label_attrs value.
I'm deliberately not going to give more details on this. If you understand the code well enough to MonkeyPatch it, then you are smart enough to know the dangers inherent in doing this.
Make the same functional changes as #3, but do it as a submitted patch to the Django code base. This is the best long-term answer for everyone, but it's also the most work for you.
Update: Yoni Samlan's link to a custom filter (http://www.djangosnippets.org/snippets/693/) works if you are generating the <label> tag yourself. My answers are directed toward still using the full power of Forms but trying to tweak the resultant <label>.