After reading a response involving security django provides for sql injections. I am wondering what the docs mean by 'the underlying driver escapes the sql'.
Does this mean, for lack of better word, that the 'database driver' checks the view/wherever the queryset is located for characteristics of the query, and denies 'characteristics' of certain queries?
I understand that this is kind of 'low-level' discussion, but I'm not understanding how underlying mechanisms are preventing this attack, and appreciate any simplified explaination of what is occuring here.
Link to docs
To be precise we are dealing here with parameters escaping.
The django itself does not escape parameters values. It uses the API of the driver that in general looks similar to this (see for example driver for postgres or mysql):
driver.executeQuery(
'select field1 from table_a where field2 = %(field2)s', {'field2': 'some value'}
)
The important thing to note here is that the parameter value (which may be provided by the user and is subject to sql injection) is not embedded into the query itself. The query is passed to the driver with placeholders for parameters values and the list or dict of parameters is passed in addition to that.
Driver then can either construct the SQL query with proper escaped values for parameters or use the API provided by the database itself which is similar in functionality (that is it gets query with placeholders and parameters values).
Django querysets use this approach to generate SQL and that what this piece of documentation is trying to say.
Related
I have asked this question on Github, issue #641, since it seems like a deficiency or an enhancement question.
Most databases (e.g. Oracle, SQL*Server, Sybase, ... ) send back a description of the resultset, even if there are zero rows. I cannot seem to find this in pqxx, nor can I find a way to describe SQL of any kind prior to execution. The only way I have been able to get the SQL metadata is by collection upon the first row retrieved, but this is limited to an actual row being retrieved. If there are zero rows, I have no way to determine the metadata.
Looking at the v15.x code base for libpq (standard C library), I see that psql has a "workaround" for this very problem. In the file src/bin/psql/common.c there is a function called static bool DescribeQuery(const char *query, double *elapsed_msec) (on or about line #1248).
This function has a rather ingenious solution for this very problem. They create an "unnamed" prepared statement, which they do not execute, instead they use the results of that call, which provides the field info, and subsequently query against the pg_catalog for the metadata descriptions (source below).
Unfortunately, the pqxx library doesn't provide enough (that I can tell) to duplicate this functionality. Does anyone have a solution for this problem?
Credit for resolution goes to Jeroen Vermeulen, on the Github issue #641. He suggested that the predicate AND false be added to the query to force a prototype result. By doing this an empty resultset is returned, with the fields available for metadata collection.
You can query Django's JSONField, either by direct lookup, or by using annotations. Now I realize if you annotate a field, you can all sorts of complex queries, but for the very basic query, which one is actually the preferred method?
Example: Lets say I have model like so
class Document(models.Model):
data = JSONField()
And then I store an object using the following command:
>>> Document.objects.create(data={'name': 'Foo', 'age': 24})
Now, the query I want is the most basic: Find all documents where data__name is 'Foo'. I can do this 2 ways, one using annotation, and one without, like so:
>>> from django.db.models.expressions import RawSQL
>>> Document.objects.filter(data__name='Foo')
>>> Document.objects.annotate(name = RawSQL("(data->>'name')::text", [])).filter(name='Foo')
So what exactly is the difference? And if I can make basic queries, why do I need to annotate? Provided of course I am not going to make complex queries.
There is no reason whatsoever to use raw SQL for queries where you can use ORM syntax. For someone who is conversant in SQL but less experienced with Django's ORM, RawSQL might provide an easier path to a certain result than the ORM, which has its own learning curve.
There might be more complex queries where the ORM runs into problems or where it might not give you the exact SQL query that you need. It is in these cases that RawSQL comes in handy – although the ORM is getting more feature-complete with every iteration, with
Cast (since 1.10),
Window functions (since 2.0),
a constantly growing array of wrappers for database functions
the ability to define custom wrappers for database functions with Func expressions (since 1.8) etc.
They are interchangable so it's matter of taste. I think Document.objects.filter(data__name='Foo') is better because:
It's easier to read
In the future, MariaDB or MySql can support JSON fields and your code will be able to run on both PostgreSQL and MariaDB.
Don't use RawSQL as a general rule. You can create security holes in your app.
I have a few text fields in my django project and I've been learning about SQL injection. Is it important to strip the text fields of potential bad characters taht might make SQL injection easier? I imagine stripping possible bad characters such as { ;, but I am not sure. These fields are short bios about a person or a contact page and so I don't imagine that they would require such characters.
To be clear, I have taken other steps to protect my website such as am using these fields things such as generating dynamic sql queries.
Short answer is: you should be fine not to worry based on django docs -
SQL injection protection
SQL injection is a type of attack where a malicious user is able to execute > arbitrary SQL code on a database. This can result in records being deleted or data leakage.
Django’s querysets are protected from SQL injection since their queries are constructed using query parameterization. A query’s SQL code is defined separately from the query’s parameters. Since parameters may be user-provided and therefore unsafe, they are escaped by the underlying database driver.
Django also gives developers power to write raw queries or execute custom sql. These capabilities should be used sparingly and you should always be careful to properly escape any parameters that the user can control. In addition, you should exercise caution when using extra() and RawSQL.
https://docs.djangoproject.com/en/2.0/topics/security/#sql-injection-protection
I'd like to ask you if you can briefly and in plain English explain to me
how cleaned_data() function validates data ?
Reason for asking is that I'm designing a web app powered by Django
and initially I thought cleaned_data() is smart enough to block user's input that contains potentially harmful characters. Such as ' ; < > and alike. Characters that can be used for SQL injection attacks.
To my surprise, when I deliberately slipped few of those characters into form field, the input made it to database. I was quite shocked.
So then ... what the cleaned_data() function is good for ?
I read about this function in docs, however I couldn't find necessarily answer to this.
cleaned_data is for validated form data. If you have a required CharField, for example, it will validate whether it is present, and whether it has enough characters. If you have an EmailField, then it will validate that it includes an email address.
Take a look at some of the build in form fields for a better idea of what you can do.
It is not intended to prevent XSS or SQL injection. It simply confirms that your form follows basic rules that you have set for it.
You missunderstood cleaned_data. The simplest definition of cleaned_data is something like:
A dict that contains data entered by the user after various validation
(built-in or custom)
Now, that being said, to understand every steps to form validation refer to this link (re-inventing the wheel would be silly since it is greatly explained.)
As for the SQL injection, this is another problem. But again, Django as a built-in way of handling it, this is from the documentation:
By using Django’s querysets, the resulting SQL will be properly
escaped by the underlying database driver. However, Django also gives
developers power to write raw queries or execute custom sql. These
capabilities should be used sparingly and you should always be careful
to properly escape any parameters that the user can control. In
addition, you should exercise caution when using extra() and RawSQL..
I can totally see your confusion, but remember that they are two different things.
I love being able to write quick and dirty query strings right into the URL of the Django admin. Like: /admin/myapp/mymodel/?pub_date__year=2011
AND statements are just as easy: /admin/myapp/mymodel/?pub_date__year=2011&author=Jim
I'm wondering if it's possible to issue an 'OR' statement via the URL. Anyone heard of such functionality?
Django < 1.4 doesn't support OR queries. Sometimes it is possible to translate OR queries to __in - queries which are supported (they are equivalent to OR queries but only for single field values).
You can also upgrade to django development version: it has more versatile list_filter implementation (see https://docs.djangoproject.com/en/dev//ref/contrib/admin/#django.contrib.admin.ModelAdmin.list_filter ) which can be used for providing advanced admin filters (including OR-queries).
The & is not a logical AND, even though it seems to be acting that way in your case. I'm pretty certain there is no way to create a logical OR in the GET query string.