_ip.magic getting into raw SQL query. How to avoid it? - django

When I make a raw SQL query in Django, some queries get _ip.magic into the string, and then string formatting raises exceptions since there are not enough or too many parameters.
Sample code was reduced to the minimum set, but still produces "magics":
> ids = (1, 4)
> curr = 3
> q = User.objects.raw(u"""
SELECT
1
WHERE
a=%s and b=%s AND a.user_id = %s
""", params=(ids, ids, curr))
> print q.query.sql
... a = _ip.magic("s and b=%s AND a.user_id = %s")
(I don't mean to run this query, just want to successfully generate a SQL.)
Why is _ip.magic there? Depending on the queries, sometimes it wraps a single parameter, sometimes several of them. How to get rid of it?
edit: the solution was to turn off automagic:
>>> _ip.options['automagic'] = 0

_ip.magic is as far as I can tell an IPython function and has nothing to do with django itself.
Try running this code in the plain vanilla django shell.

There must be something more to that than this, here's what I'm getting:
In [12]: print User.objects.raw('select id from auth_user where id=%s OR id=%s', (1,2)).query
<RawQuery: 'select id from auth_user where id=1 OR id=2'>
In [13]: print User.objects.raw('select id from auth_user where id=%s OR id=%s', (1,2)).query.sql
select id from auth_user where id=%s OR id=%s
In other words, some other code is probably affecting your actions.

Related

When does the SQL execute in Django ORM

To begin with, I will give an example:
# Student is a model class, and it has attributes: name, age, gender and so on.
temp_students = Student.objects.filter(age=18)
students = temp_students.filter(gender='girl')
If I debug this code, I can get an SQL which may be "SELECT * FROM student WHERE age = 18"(called SQL-A). Then, when I reach the second line, I may get another SQL which is "SELECT * FROM student WHERE gender = 'girl' IN (SELECT * FROM student WHERE age = 18)"(called SQL-B).
So, my QUESTION is when does the SQL-A and SQL-B execute? DOES it connect to database twice, and get two result sets? In this case, is there any unnecessary spending for the database? If not so, why can I get the SQL looks like in DEBUG MODE?
It will be great if there is any related Django ORM doc or article at the end of your answer.
THANKS!
Django querysets are "lazy" - which means they only perform database operation once they are evaluated.
For example here:
queryset1 = Student.objects.filter(...)
queryset2 = queryset1.filter(...)
for i in queryset2:
print(i)
In the example above the queryset is only evaluated when it reaches the for-loop, and that's when it's actually accessing the database. It will use one SQL query, that is constructed based on the prior filter statements.
More info in Django docs: https://docs.djangoproject.com/en/2.0/topics/db/queries/#querysets-are-lazy

Django ORM "get" translation to SQL

For a Queryset in Django, we can call its method .query to get the raw sql.
for example,
queryset = AModel.objects.all()
print queryset.query
the output could be: SELECT "id", ... FROM "amodel"
But for retrieving a object by "get", say,
item = AModel.objects.get(id = 100)
how to get the equivalent raw sql? Notice: the item might be None.
The item = AModel.objects.get(id = 100) equals to
items = AModel.objects.filter(id = 100)
if len(items) == 1:
return items[0]
else:
raise exception
Thus the executed query equals to AModel.objects.filter(id = 100)
Also, you could check the latest item of connection.queries
from django.db import connection # use connections for non-default dbs
print connection.queries[-1]
And, as FoxMaSk said, install django-debug-toolbar and enjoy it in your browser.
It's the same SQL, just with a WHERE id=100 clause tacked to the end.
However, FWIW, If a filter is specific enough to only return one result, it's the same SQL as get would produce, the only difference is on the Python side at that point, e.g.
AModel.objects.get(id=100)
is the same as:
AModel.objects.filter(id=100).get()
So, you can simply query AModel.objects.filter(id=100) and then use queryset.query with that.
if it's just for debugging purpose you can use "the django debug bar" which can be installed by
pip install django-debug-toolbar

Django: filter a RawQuerySet

i've got some weird query, so i have to execute raw SQL. The thing is that this query is getting bigger and bigger and with lots of optional filters (ordering, column criteria, etc.).
So, given the this query:
SELECT DISTINCT Camera.* FROM Camera c
INNER JOIN cameras_features fc1 ON c.id = fc1.camera_id AND fc1.feature_id = 1
INNER JOIN cameras_features fc2 ON c.id = fc2.camera_id AND fc2.feature_id = 2
This is roughly the Python code:
def get_cameras(features):
query = "SELECT DISTINCT Camera.* FROM Camera c"
i = 1
for f in features:
alias_name = "fc%s" % i
query += "INNER JOIN cameras_features %s ON c.id = %s.camera_id AND %s.feature_id = " % (alias_name,alias_name,alias_name)
query += " %s "
i += 1
return Camera.objects.raw(query, tuple(features))
This is working great, but i need to add more filters and ordering, for example suppose i need to filter by color and order by price, it starts to grow:
#extra_filters is a list of tuples like:
# [('price', '=', '12'), ('color' = 'blue'), ('brand', 'like', 'lum%']
def get_cameras_big(features,extra_filters=None,order=None):
query = "SELECT DISTINCT Camera.* FROM Camera c"
i = 1
for f in features:
alias_name = "fc%s" % i
query += "INNER JOIN cameras_features %s ON c.id = %s.camera_id AND %s.feature_id = " % (alias_name,alias_name,alias_name)
query += " %s "
i += 1
if extra_filters:
query += " WHERE "
for ef in extra_filters:
query += "%s %s %s" % ef #not very safe, refactoring needed
if order:
query += "order by %s" % order
return Camera.objects.raw(query, tuple(features))
So, i don't like how it started to grow, i know Model.objects.raw() returns a RawQuerySet, so i'd like to do something like this:
queryset = get_cameras( ... )
queryset.filter(...)
queryset.order_by(...)
But this doesn't work. Of course i could just perform the raw query and after that get the an actual QuerySet with the data, but i will perform two querys. Like:
raw_query_set = get_cameras( ... )
camera.objects.filter(id__in(raw_query_set.ids)) #don't know if it works, but you get the idea
I'm thinking that something with the QuerySet init or the cache may do the trick, but haven't been able to do it.
.raw() is an end-point. Django can't do anything with the queryset because that would require being able to somehow parse your SQL back into the DBAPI it uses to create SQL in the first place. If you use .raw() it is entirely on you to construct the exact SQL you need.
If you can somehow reduce your query into something that could be handled by .extra() instead. You could construct whatever query you like with Django's API and then tack on the additional SQL with .extra(), but that's going to be your only way around.
There's another option: turn the RawQuerySet into a list, then you can do your sorting like this...
results_list.sort(key=lambda item:item.some_numeric_field, reverse=True)
and your filtering like this...
filtered_results = [i for i in results_list if i.some_field == 'something'])
...all programatically. I've been doing this a ton to minimize db requests. Works great!
I implemented Django raw queryset which supports filter(), order_by(), values() and values_list(). It will not work for any RAW query but for typical SELECT with some INNER JOIN or a LEFT JOIN it should work.
The FilteredRawQuerySet is implemented as a combination of Django model QuerySet and RawQuerySet, where the base (left part) of the SQL query is generated via RawQuerySet, while WHERE and ORDER BY directives are generared by QuerySet:
https://github.com/Dmitri-Sintsov/django-jinja-knockout/blob/master/django_jinja_knockout/query.py
It works with Django 1.8 .. 1.11.
It also has a ListQuerySet implementation for Prefetch object result lists of model instances as well, so these can be processed the same way as ordinary querysets.
Here is the example of usage:
https://github.com/Dmitri-Sintsov/djk-sample/search?l=Python&q=filteredrawqueryset&type=&utf8=%E2%9C%93
Another thing you can do is that if you are unable to convert it to a regular QuerySet is to create a View in your database backend. It basically executes the query in the View when you access it. In Django, you would then create an unmanaged model to attach to the View. With that model, you can apply filter as if it were a regular model. With your foreign keys, you would set the on_delete arg to models.DO_NOTHING.
More information about unmanaged models:
https://docs.djangoproject.com/en/2.0/ref/models/options/#managed

Can Django do nested queries and exclusions

I need some help putting together this query in Django. I've simplified the example here to just cut right to the point.
MyModel(models.Model):
created = models.DateTimeField()
user = models.ForeignKey(User)
data = models.BooleanField()
The query I'd like to create in English would sound like:
Give me every record that was created yesterday for which data is False where in that same range data never appears as True for the given user
Here's an example input/output in case that wasn't clear.
Table Values
ID Created User Data
1 1/1/2010 admin False
2 1/1/2010 joe True
3 1/1/2010 admin False
4 1/1/2010 joe False
5 1/2/2010 joe False
Output Queryset
1 1/1/2010 admin False
3 1/1/2010 admin False
What I'm looking to do is to exclude record #4. The reason for this is because in the given range "yesterday", data appears as True once for the user in record #2, therefore that would exclude record #4.
In a sense, it almost seems like there are 2 queries taking place. One to determine the records in the given range, and one to exclude records which intersect with the "True" records.
How can I do this query with the Django ORM?
You don't need a nested query. You can generate a list of bad users' PKs and then exclude records containing those PKs in the next query.
bad = list(set(MyModel.obejcts.filter(data=True).values_list('user', flat=True)))
# list(set(list_object)) will remove duplicates
# not needed but might save the DB some work
rs = MyModel.objects.filter(datequery).exclude(user__pk__in=bad)
# might not need the pk in user__pk__in - try it
You could condense that down into one line but I think that's as neat as you'll get. 2 queries isn't so bad.
Edit: You might wan to read the docs on this:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#in
It makes it sound like it auto-nests the query (so only one query fires in the database) if it's like this:
bad = MyModel.objects.filter(data=True).values('pk')
rs = MyModel.objects.filter(datequery).exclude(user__pk__in=bad)
But MySQL doesn't optimise this well so my code above (2 full queries) can actually end up running a lot faster.
Try both and race them!
looks like you could use:
from django.db.models import F
MyModel.objects.filter(datequery).filter(data=False).filter(data = F('data'))
F object available from version 1.0
Please, test it, I'm not sure.
Thanks to lazy evaluation, you can break your query up into a few different variables to make it easier to read. Here is some ./manage.py shell play time in the style that Oli already presented.
> from django.db import connection
> connection.queries = []
> target_day_qs = MyModel.objects.filter(created='2010-1-1')
> bad_users = target_day_qs.filter(data=True).values('user')
> result = target_day_qs.exclude(user__in=bad_users)
> [r.id for r in result]
[1, 3]
> len(connection.queries)
1
You could also say result.select_related() if you wanted to pull in the user objects in the same query.

fast lookup for the last element in a Django QuerySet?

I've a model called Valor. Valor has a Robot. I'm querying like this:
Valor.objects.filter(robot=r).reverse()[0]
to get the last Valor the the r robot. Valor.objects.filter(robot=r).count() is about 200000 and getting the last items takes about 4 seconds in my PC.
How can I speed it up? I'm querying the wrong way?
The optimal mysql syntax for this problem would be something along the lines of:
SELECT * FROM table WHERE x=y ORDER BY z DESC LIMIT 1
The django equivalent of this would be:
Valor.objects.filter(robot=r).order_by('-id')[:1][0]
Notice how this solution utilizes django's slicing method to limit the queryset before compiling the list of objects.
If none of the earlier suggestions are working, I'd suggest taking Django out of the equation and run this raw sql against your database. I'm guessing at your table names, so you may have to adjust accordingly:
SELECT * FROM valor v WHERE v.robot_id = [robot_id] ORDER BY id DESC LIMIT 1;
Is that slow? If so, make your RDBMS (MySQL?) explain the query plan to you. This will tell you if it's doing any full table scans, which you obviously don't want with a table that large. You might also edit your question and include the schema for the valor table for us to see.
Also, you can see the SQL that Django is generating by doing this (using the query set provided by Peter Rowell):
qs = Valor.objects.filter(robot=r).order_by('-id')[0]
print qs.query
Make sure that SQL is similar to the 'raw' query I posted above. You can also make your RDBMS explain that query plan to you.
It sounds like your data set is going to be big enough that you may want to denormalize things a little bit. Have you tried keeping track of the last Valor object in the Robot object?
class Robot(models.Model):
# ...
last_valor = models.ForeignKey('Valor', null=True, blank=True)
And then use a post_save signal to make the update.
from django.db.models.signals import post_save
def record_last_valor(sender, **kwargs):
if kwargs.get('created', False):
instance = kwargs.get('instance')
instance.robot.last_valor = instance
post_save.connect(record_last_valor, sender=Valor)
You will pay the cost of an extra db transaction when you create the Valor objects but the last_valor lookup will be blazing fast. Play with it and see if the tradeoff is worth it for your app.
Well, there's no order_by clause so I'm wondering about what you mean by 'last'. Assuming you meant 'last added',
Valor.objects.filter(robot=r).order_by('-id')[0]
might do the job for you.
django 1.6 introduces .first() and .last():
https://docs.djangoproject.com/en/1.6/ref/models/querysets/#last
So you could simply do:
Valor.objects.filter(robot=r).last()
Quite fast should also be:
qs = Valor.objects.filter(robot=r) # <-- it doesn't hit the database
count = qs.count() # <-- first hit the database, compute a count
last_item = qs[ count-1 ] # <-- second hit the database, get specified rownum
So, in practice you execute only 2 SQL queries ;)
Model_Name.objects.first()
//To get the first element
Model_name.objects.last()
//For get last()
in my case, the last is not work because there is only one row in the database
maybe help full for you too :)
Is there a limit clause in django? This way you can have the db, simply return a single record.
mysql
select * from table where x = y limit 1
sql server
select top 1 * from table where x = y
oracle
select * from table where x = y and rownum = 1
I realize this isn't translated into django, but someone can come back and clean this up.
The correct way of doing this, is to use the built-in QuerySet method latest() and feeding it whichever column (field name) it should sort by. The drawback is that it can only sort by a single db column.
The current implementation looks like this and is optimized in the same sense as #Aaron's suggestion.
def latest(self, field_name=None):
"""
Returns the latest object, according to the model's 'get_latest_by'
option or optional given field_name.
"""
latest_by = field_name or self.model._meta.get_latest_by
assert bool(latest_by), "latest() requires either a field_name parameter or 'get_latest_by' in the model"
assert self.query.can_filter(), \
"Cannot change a query once a slice has been taken."
obj = self._clone()
obj.query.set_limits(high=1)
obj.query.clear_ordering()
obj.query.add_ordering('-%s' % latest_by)
return obj.get()