I have a few text fields in my django project and I've been learning about SQL injection. Is it important to strip the text fields of potential bad characters taht might make SQL injection easier? I imagine stripping possible bad characters such as { ;, but I am not sure. These fields are short bios about a person or a contact page and so I don't imagine that they would require such characters.
To be clear, I have taken other steps to protect my website such as am using these fields things such as generating dynamic sql queries.
Short answer is: you should be fine not to worry based on django docs -
SQL injection protection
SQL injection is a type of attack where a malicious user is able to execute > arbitrary SQL code on a database. This can result in records being deleted or data leakage.
Django’s querysets are protected from SQL injection since their queries are constructed using query parameterization. A query’s SQL code is defined separately from the query’s parameters. Since parameters may be user-provided and therefore unsafe, they are escaped by the underlying database driver.
Django also gives developers power to write raw queries or execute custom sql. These capabilities should be used sparingly and you should always be careful to properly escape any parameters that the user can control. In addition, you should exercise caution when using extra() and RawSQL.
https://docs.djangoproject.com/en/2.0/topics/security/#sql-injection-protection
Related
I need to take raw JSON data and put it direct into a Django model by using the Mymodel.objects.create( ... ) method. If I then run full_clean() on the instance created, is it then secure in terms of potential SQL injection or any malicious data that could potentially be injected?
The reason that I am not using a form is that the logic on the form is quite complex in that it dynamically builds a form so I need to post data in json format. I don't want to use the rest api as there is just one page where it has this complexity.
.full_clean(…) [Django-doc] does not perform any checks on SQL injection, nor does the Django ORM, since it simply escapes all parameters, so even if the data contains SQL statements, these are escaped, and therefore, normally, SQL injection is not possible.
But you need to run full_clean to validate data integrity. If you define constraints on your model, these are normally validated at the full_clean part. The ORM does not run these queries.
You thus can work with:
obj = Mymodel(**data)
obj.full_clean()
obj.save()
The reason that I am not using a form is that the logic on the form is quite complex in that it dynamically builds a form.
A form can remove a lot of boilerplate code, since it does not only performs validation, but also cleaning, it makes error messages more convenient, etc.
Django ORM protects the database from SQL-injections, but you are responsible for the output. For convenient data cleaning, I recommend using a DRF Serializers
After reading a response involving security django provides for sql injections. I am wondering what the docs mean by 'the underlying driver escapes the sql'.
Does this mean, for lack of better word, that the 'database driver' checks the view/wherever the queryset is located for characteristics of the query, and denies 'characteristics' of certain queries?
I understand that this is kind of 'low-level' discussion, but I'm not understanding how underlying mechanisms are preventing this attack, and appreciate any simplified explaination of what is occuring here.
Link to docs
To be precise we are dealing here with parameters escaping.
The django itself does not escape parameters values. It uses the API of the driver that in general looks similar to this (see for example driver for postgres or mysql):
driver.executeQuery(
'select field1 from table_a where field2 = %(field2)s', {'field2': 'some value'}
)
The important thing to note here is that the parameter value (which may be provided by the user and is subject to sql injection) is not embedded into the query itself. The query is passed to the driver with placeholders for parameters values and the list or dict of parameters is passed in addition to that.
Driver then can either construct the SQL query with proper escaped values for parameters or use the API provided by the database itself which is similar in functionality (that is it gets query with placeholders and parameters values).
Django querysets use this approach to generate SQL and that what this piece of documentation is trying to say.
You can query Django's JSONField, either by direct lookup, or by using annotations. Now I realize if you annotate a field, you can all sorts of complex queries, but for the very basic query, which one is actually the preferred method?
Example: Lets say I have model like so
class Document(models.Model):
data = JSONField()
And then I store an object using the following command:
>>> Document.objects.create(data={'name': 'Foo', 'age': 24})
Now, the query I want is the most basic: Find all documents where data__name is 'Foo'. I can do this 2 ways, one using annotation, and one without, like so:
>>> from django.db.models.expressions import RawSQL
>>> Document.objects.filter(data__name='Foo')
>>> Document.objects.annotate(name = RawSQL("(data->>'name')::text", [])).filter(name='Foo')
So what exactly is the difference? And if I can make basic queries, why do I need to annotate? Provided of course I am not going to make complex queries.
There is no reason whatsoever to use raw SQL for queries where you can use ORM syntax. For someone who is conversant in SQL but less experienced with Django's ORM, RawSQL might provide an easier path to a certain result than the ORM, which has its own learning curve.
There might be more complex queries where the ORM runs into problems or where it might not give you the exact SQL query that you need. It is in these cases that RawSQL comes in handy – although the ORM is getting more feature-complete with every iteration, with
Cast (since 1.10),
Window functions (since 2.0),
a constantly growing array of wrappers for database functions
the ability to define custom wrappers for database functions with Func expressions (since 1.8) etc.
They are interchangable so it's matter of taste. I think Document.objects.filter(data__name='Foo') is better because:
It's easier to read
In the future, MariaDB or MySql can support JSON fields and your code will be able to run on both PostgreSQL and MariaDB.
Don't use RawSQL as a general rule. You can create security holes in your app.
I'd like to ask you if you can briefly and in plain English explain to me
how cleaned_data() function validates data ?
Reason for asking is that I'm designing a web app powered by Django
and initially I thought cleaned_data() is smart enough to block user's input that contains potentially harmful characters. Such as ' ; < > and alike. Characters that can be used for SQL injection attacks.
To my surprise, when I deliberately slipped few of those characters into form field, the input made it to database. I was quite shocked.
So then ... what the cleaned_data() function is good for ?
I read about this function in docs, however I couldn't find necessarily answer to this.
cleaned_data is for validated form data. If you have a required CharField, for example, it will validate whether it is present, and whether it has enough characters. If you have an EmailField, then it will validate that it includes an email address.
Take a look at some of the build in form fields for a better idea of what you can do.
It is not intended to prevent XSS or SQL injection. It simply confirms that your form follows basic rules that you have set for it.
You missunderstood cleaned_data. The simplest definition of cleaned_data is something like:
A dict that contains data entered by the user after various validation
(built-in or custom)
Now, that being said, to understand every steps to form validation refer to this link (re-inventing the wheel would be silly since it is greatly explained.)
As for the SQL injection, this is another problem. But again, Django as a built-in way of handling it, this is from the documentation:
By using Django’s querysets, the resulting SQL will be properly
escaped by the underlying database driver. However, Django also gives
developers power to write raw queries or execute custom sql. These
capabilities should be used sparingly and you should always be careful
to properly escape any parameters that the user can control. In
addition, you should exercise caution when using extra() and RawSQL..
I can totally see your confusion, but remember that they are two different things.
The typical controls against SQL injection flaws are to use bind variables (cfqueryparam tag), validation of string data and to turn to stored procedures for the actual SQL layer. This is all fine and I agree, however what if the site is a legacy one and it features a lot of dynamic queries. Then, rewriting all the queries is a herculean task and it requires an extensive period of regression and performance testing. I was thinking of using a dynamic SQL filter and calling it prior to calling cfquery for the actual execution.
I found one filter in CFLib.org (http://www.cflib.org/udf/sqlSafe):
<cfscript>
/**
* Cleans string of potential sql injection.
*
* #param string String to modify. (Required)
* #return Returns a string.
* #author Bryan Murphy (bryan#guardianlogic.com)
* #version 1, May 26, 2005
*/
function metaguardSQLSafe(string) {
var sqlList = "-- ,'";
var replacementList = "#chr(38)##chr(35)##chr(52)##chr(53)##chr(59)##chr(38)##chr(35)##chr(52)##chr(53)##chr(59)# , #chr(38)##chr(35)##chr(51)##chr(57)##chr(59)#";
return trim(replaceList( string , sqlList , replacementList ));
}
</cfscript>
This seems to be quite a simple filter and I would like to know if there are ways to improve it or to come up with a better solution?
what if the site is a legacy one and
it features a lot of dynamic queries.
Then, rewriting all the queries is a
herculean task and it requires an
extensive period of regression and
performance testing.
Yep, but that's the case if you perform any significant changes, including using a function like the one you are proposing.
So I'd still recommend getting some tests setup, refactoring to use a sensible framework, and then fixing the queries to use cfqueryparam.
That specific function is a bunch of nonsense, which does not do what it claims to do, and has the potential to break stuff (by incorrectly exceeding max lengths).
All it does is turns -- into -- and ' into ' - this is not SQL injection protection!
So yeah, if you still do want to go down that route, find a different function, but I'd recommend proper refactoring.
Obviously you have a lot of work ahead of you. But as you roll up your sleeves, one small thing you might do to mitigate some of the potential damage from injection attacks is to create several datasources, and run all your select-only queries through a datasource restricted to only select statements. And for all of the datasources, make sure things like grant, revoke, create, alter, and drop are disabled.
You might try Portcullis. It is an open source CFC that you can use to scan the URL, FORM and COOKIE scopes for SQL Injection and XSS attacks. It won't be guaranteed protection but would at least provide some protection today with little effort while you work on a rewrite of the queries. The nice thing is it can be included in the Application.cfm/cfc to scan the scopes on every CF page request at the cost of about 4 lines of code.
Put this coding into your application.cfm file.
<cfif FindNoCase(“DECLARE”,cgi.query_string) and FindNoCase(“CAST”,cgi.query_string) and FindNoCase(“EXEC”,cgi.query_string)>
<cfabort showerror="Oops..!! It's SQL injection." >
</cfif>
http://ppshein.wordpress.com/2008/08/23/sql-injection-attacks-by-store-procedure/
http://ppshein.wordpress.com/2008/08/28/block-ip-in-coldfusion/