Django full_clean method and data security - django

I need to take raw JSON data and put it direct into a Django model by using the Mymodel.objects.create( ... ) method. If I then run full_clean() on the instance created, is it then secure in terms of potential SQL injection or any malicious data that could potentially be injected?
The reason that I am not using a form is that the logic on the form is quite complex in that it dynamically builds a form so I need to post data in json format. I don't want to use the rest api as there is just one page where it has this complexity.

.full_clean(…) [Django-doc] does not perform any checks on SQL injection, nor does the Django ORM, since it simply escapes all parameters, so even if the data contains SQL statements, these are escaped, and therefore, normally, SQL injection is not possible.
But you need to run full_clean to validate data integrity. If you define constraints on your model, these are normally validated at the full_clean part. The ORM does not run these queries.
You thus can work with:
obj = Mymodel(**data)
obj.full_clean()
obj.save()
The reason that I am not using a form is that the logic on the form is quite complex in that it dynamically builds a form.
A form can remove a lot of boilerplate code, since it does not only performs validation, but also cleaning, it makes error messages more convenient, etc.

Django ORM protects the database from SQL-injections, but you are responsible for the output. For convenient data cleaning, I recommend using a DRF Serializers

Related

Is there an HTML sanitizer library for django to block injection attacks?

I'm trying to find something that will return an exception upon finding anything that even remotely looks like HTML or Javascript. I've figured out how to do it for individual views, but it's not a scalable solution, and ultimately I need to prevent code from being saved to the database no matter what view gets targeted by the injection attack.
Here is the functionality I'm looking for.
ILLEGAL_CHARS = '<>[]{}():;,'.split()
# bunch of code in between
for value in [company_name, url, status, information, lt_type, company_source]:
if any(char in value for char in ILLEGAL_CHARS):
raise Exception(f"You passed one of several illegal characters: {ILLEGAL_CHARS}")
I'm using django rest framework so I have to handle it on the backend. Thanks.
actually you don't nead to sanitize any user input because when you show them int the template the jinja {{object}} will make sure that no html or java script will be executed until you mark them as safe {{object|safe}} but if you want want not to save them in database that might help Sanitizing HTML in submitted form data

Django: Preventative measures for SQL Injection

I have a few text fields in my django project and I've been learning about SQL injection. Is it important to strip the text fields of potential bad characters taht might make SQL injection easier? I imagine stripping possible bad characters such as { ;, but I am not sure. These fields are short bios about a person or a contact page and so I don't imagine that they would require such characters.
To be clear, I have taken other steps to protect my website such as am using these fields things such as generating dynamic sql queries.
Short answer is: you should be fine not to worry based on django docs -
SQL injection protection
SQL injection is a type of attack where a malicious user is able to execute > arbitrary SQL code on a database. This can result in records being deleted or data leakage.
Django’s querysets are protected from SQL injection since their queries are constructed using query parameterization. A query’s SQL code is defined separately from the query’s parameters. Since parameters may be user-provided and therefore unsafe, they are escaped by the underlying database driver.
Django also gives developers power to write raw queries or execute custom sql. These capabilities should be used sparingly and you should always be careful to properly escape any parameters that the user can control. In addition, you should exercise caution when using extra() and RawSQL.
https://docs.djangoproject.com/en/2.0/topics/security/#sql-injection-protection

Validation with cleaned_data()

I'd like to ask you if you can briefly and in plain English explain to me
how cleaned_data() function validates data ?
Reason for asking is that I'm designing a web app powered by Django
and initially I thought cleaned_data() is smart enough to block user's input that contains potentially harmful characters. Such as ' ; < > and alike. Characters that can be used for SQL injection attacks.
To my surprise, when I deliberately slipped few of those characters into form field, the input made it to database. I was quite shocked.
So then ... what the cleaned_data() function is good for ?
I read about this function in docs, however I couldn't find necessarily answer to this.
cleaned_data is for validated form data. If you have a required CharField, for example, it will validate whether it is present, and whether it has enough characters. If you have an EmailField, then it will validate that it includes an email address.
Take a look at some of the build in form fields for a better idea of what you can do.
It is not intended to prevent XSS or SQL injection. It simply confirms that your form follows basic rules that you have set for it.
You missunderstood cleaned_data. The simplest definition of cleaned_data is something like:
A dict that contains data entered by the user after various validation
(built-in or custom)
Now, that being said, to understand every steps to form validation refer to this link (re-inventing the wheel would be silly since it is greatly explained.)
As for the SQL injection, this is another problem. But again, Django as a built-in way of handling it, this is from the documentation:
By using Django’s querysets, the resulting SQL will be properly
escaped by the underlying database driver. However, Django also gives
developers power to write raw queries or execute custom sql. These
capabilities should be used sparingly and you should always be careful
to properly escape any parameters that the user can control. In
addition, you should exercise caution when using extra() and RawSQL..
I can totally see your confusion, but remember that they are two different things.

Django: Filtering a queryset locally from cache

If I perform a prefetch_related('toppings') for a queryset, and I want to later filter(spicy=True) by fields in the related table, Django ignores the cached info and does a database query. I found that this is documented (under the Note box) and seems to happen for all forms of caching (select_related(), already evaluated querysets, etc.) when another filter() is performed.
However, is there some sort of super secret hidden time-saving shortcut to filter locally (using the cache and not hitting the database) without having to write the python code to loop the queryset (using list/dict comprehension, etc.)? Maybe something like a filter_locally(spicy=True)?
EDIT:
One of the reasons why a list/comprehension doesn't work well for me is because a list/dict does not have the queryset methods. In my case, the first level M2M field, toppings, isn't the end goal for me and I need to check a 2nd related M2M field (which I have already pre-fetched as well). While this is also possible using list comprehension, it's just much simpler to have something such as filter_locally(spicy=True, origin__country='Spain') because:
it allows accessing many levels of related fields with minimal effort
it allows chaining other queryset methods
it's easier to read because it's consistent with the familiar filter()
it's easier to modify existing code using filter() without prefetch to add this optimization in without much changes.
But from the responses, Django has no such support :(
You have to write the python code to loop through the queryset (a list/dict comprehension is ideal). All the filter() code knows how to do is add filtering language to the SQL sent to the database. Filtering locally is a totally different problem than filtering remotely, so the solutions to those two separate problems won't be able to share any logic.
A list comprehension one-liner would be pretty straightforward, though; the syntax might not be much more complex than with filter().
If you're filtering on a boolean doing the list comprehension is pretty easy. You can also swap out the topping.spicy==True for a string comparison or whatever.
I would do something like:
qs = Pizza.objects.all().prefetch_related('toppings')
res = list(qs)
def get_spicy(qs):
res = list(qs)
return [pizza for pizza in res if any(topping.spicy==True for
topping in pizza.toppings.all())]
That is if you want to return the pizza object if any of its toppings is spicy. You can also replace the any() with all() to check for all, and do a lot of pretty powerful queries with this syntax. I'm somewhat surprised that there is no easy way to do this in django. It seems like a lot of these simple queries should be easy to implement in a generic manner.
The above code assumes a many2many. It should be easy to modify to work with a simple FK relationship such as a one2one or one2many.
Hope this was helpful.

Most RESTful way of storing data that is to be served as JSON?

I have a Django server responsible for serving JSON formatted files to a controller machine. My server gets data from POST requests and serve JSON files on GET requests. I'm wondering what is the most RESTful and efficient way of creating this bridge on the server?
I'm considering creating a JSON model and storing instances on my database for each POST request and pack the instances into JSON files dynamically and serve them on GET requests. The alternative would be creating JSON files on POST requests, saving them in a folder on the server, and serving these files on GET requests.
Which way is better and why? Or am I failing to see a completely better method?
Why are you creating files? You can let a Django view return a JSON response instead of a HTML response:
import json
# in your view:
data = {}
return HttpResponse(json.dumps(data), mimetype="application/json")
Construct the JSON data dynamically from the available data and if the JSON responses are big, add caching (e.g., memcached or Varnish).
This will prevent certain problems (what to do if a GET request is done but there is no JSON file yet?). This way you can generate the JSON based on the data that you have or return JSON with an error message instead.
I looked into json serialization with natural keys and dependencies etc. to control the fields that are serialized. I also tried using the middleware wodofstuff to allow for deeper foreign key serialization. However, I decided to use templates to render the JSON response.
Some pitfalls are
I am responsible for formatting the JSON (more prone to mistakes like a missing semi-colon)
I am responsible for escaping characters
Renders slower than buit-in serialization?
Some advantages are
I have control over what is serialized (even if the underlying models is changed)
I can format many to many or foreign key relationships on the JSON file however I like
TLDL; In my case, the format of the JSON file I needed was very custom. It had dictionary in list in dictionary. Some fields are iterative so I have for loops in the templates to render them. However, the format requires some of the fields in the iterated object to be encapsulated in a list as oppose to a dictionary.
This was an obstacle I ran into while considering simplejson. By using something like
import simplejson as json
def encode_complex(obj):
if isinstance(obj, complex):
return [obj.real, obj.imag]
raise TypeError(repr(o) + " is not JSON serializable")
json.dumps(2 + 1j, default=encode_complex)
'[2.0, 1.0]'
I could manage to return iterated data; however, I needed iteration in iteration and custom object types (list or dict) to encapsulate certain iterations. Finally, (may it be lack of knowledge or lack of patience) I decided to just do it in the templates.
I feel like rendering it through templates is not the most scalable or 'smartest' way of doing it, could it possibly be done in a better way? Please feel free to prove me right or wrong.