Is it possible to print the SQL made by an queryset.exists() statement in Django?
Usually yes you can print SQL QuerySets generate however since exists() does not return a QuerySet but returns a simple boolean, it is more difficult.
Perhaps the easiest way is not to print SQL query for just exists() but for all queries in the view. You can follow other SO question on how to do that (example) or you can use django-debug-toolbar.
If you are also interesting in printing queries QuerySet generates, you can print a complete SQL query using:
print(Model.objects.filter(...).exists().query)
That will print a complete query.
If your intention however is to be able to copy-paste the query and execute it directly, it might not always work. For example printing the query does not always produce correct syntax such as with dates. There however is another useful method in Django Query objects (which is QuerySet.query is instance of) - sql_with_params(). That returns your parameterized query with the parameters themselves. For example:
sql, params = Model.objects.filter(...).exists().query.sql_with_params()
Model.objects.raw(sql, params=params)
Related
I want to retrieving unique foreign key instances and ordering randomly
However, I got an error when I want to use order_by('?')
My query is like this:
qs=Course.objects.distinct('courseschedule__object_id').order_by('courseschedule__object_id')
this query is works fine, but right now I want to order randomly(to get random result every time),I try this
qs=qs.order_by('?')
I got this error:
django.db.utils.ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
Any idea how to fix it? My database is Postgres, I don't want do rawSQL.... I really appreciate you guys help!!!!
First of all, random order using the database is expensive and slow according to: querysetsDjango . Second, in my opinion instead of using the database to return a random order get your query from the database without distinct (because it also can be slow), shuffle the query using python methods for example: randomOrder and make a set() and then . This way you are not using the database to do your stuff but using pure python code.
Let's say I need to do some work both on a set of model objects, as well as a subset of the first set:
things = Thing.objects.filter(active=True)
for thing in things: # (1)
pass # do something with each `thing`
special_things = things.filter(special=True)
for thing in special_things: # (2)
pass # do something more with these things
My understanding is that at point (1) marked in the code above, an actual SQL query something like SELECT * FROM things_table WHERE active=1 will get executed against the database. The QuerySet documentation also says:
When a QuerySet is evaluated, it typically caches its results.
Now my question is, what happens at point (2) in the example Python code above?
Will Django execute a second SQL query, something like SELECT * FROM things_table WHERE active=1 AND special=1?
Or, will it use the cached result from earlier, automatically doing for me behind the scenes something like the more optimal filter(lambda d: d.special == True, things), i.e. avoiding a needless second trip to the database?
Either way, is the current behavior guaranteed (by documentation or something) or should I not rely on it? For example, it is not only a point of optimization, but could also make a possible logic difference if the database table is modified by another thread/process between the two potential queries.
It will execute a second SQL query. filter creates a new queryset, which doesn't copy the results cache.
As for guarantees - well, the docs specify that filter returns a new queryset object. I think you can be confident that that new queryset won't have cached results yet. As further support, the "when are querysets evaluated" docs suggest using .all() to get a new queryset if you want to pick up possibly changed results:
If the data in the database might have changed since a QuerySet was
evaluated, you can get updated results for the same query by calling
all() on a previously evaluated QuerySet.
THE SIMPLE VERSION:
Why is raw SQL in Django more efficient than the QuerySet interface?
SOME DETAILS:
I have a query that returns ~ 700,000 (could be more) rows from a PostgreSQL database. Each row contains a few double values, some strings, and some integers. So a moderately complex return.
It is simple in form (oversimplified example):
SELECT (a,b,c) FROM table WHERE d=something AND e=somethings ORDER BY a;
When I use the model interface and .filter() to make the query the execution of the query takes ~30 seconds. This is unacceptable.
I have tried using all of the suggested methods. (Iterator,memory efficient iterator,etc...)
However, when I do the EXACT same query using connection.cursor ... and fetchall in Django the query drops to about 5 seconds to execute.
What overhead does using the django model interface produce that accounts for this significant performance difference?
UPDATE:
Django QuerySet code:
c_layer_points = models.layer_points.objects.filter(location_id__location_name=region,season_id__season_name=season,line_path_id=c_line_path.pk,radar_id=c_radar.pk,gps_time__gte=start_gps,gps_time__lte=stop_gps).order_by('gps_time').values_list('gps_time','twtt','pick_type','quality','layer_id')
EXACT same query in fast version:
# OPEN a cursor
cursor = connection.cursor()
# EXECUTE the query
cursor.execute(query)
transaction.commit_unless_managed()
# FETCH all the rows
rows = cursor.fetchall()
Where 'query' is the EXACT string representation of the connection.queries code generated from the Queryset.
UPDATE 2:
The timing is done using line_profiler and taking the sum of time from initial query to returned list of tuples (Exact same return by both options). I've also tested the time the raw query takes directly on the database (exact same for both). The discrepancy in timing is when it's done from python via each method.
If you timed the two code segments in the update to your question, then yes, the difference is because django is marshalling the results of the DB query into 700,000 python objects (i.e., it's calling object.__init__() 700,000 times).
There is nothing wrong with using raw sql for the query. This is a case where it might be advised, depending on what you do with the info.
That said .... do you need 700,000 objects in the response? Will 700,000 items in a dict do instead (that's what the raw sql query returns)? Or can you limit the rows you get back with pagination or query-set slicing?
When filtering a queryset, I'm wondering if the following are equivalent.
User.objects.filter(username='josh').filter(email__startswith='josh')
User.objects.filter(username='josh', email__startswith='josh')
I can't imagine how the generated SQL could be any different between the two. The documentation doesn't seem to mention any differences either.
You can execute those queries in the shell and print out the generated SQL like:
>>> print User.objects.filter(username='josh').filter(email__startswith='josh').query
I tested a similiar queries like you got here and there was no difference in the generated SQL code.Both statements end up using them same WHERE Statement.
Furhtermore it shouldnt make any difference in this case whether you chain the filters or apply them in one step.
But there are scenarios in which the order of filtering matters.
Have a look here and here.
Django QuerySets are lazy, running:
User.objects.filter(username='josh').filter(email__startswith='josh')
or even
a = User.objects.filter(username='josh')
a = a.filter(email__startswith='josh')
produces only a single db query, that is performed when you try to access your data. Such query agreegates all filters and excludes in the where clause.
everybody.
I work with Django 1.3 and Postgres 9.0. I have very complex sql query which extends simple model table lookup with some extra fields. And it wrapped in table function since it is parameterized.
A month before I managed to make it work with the help of raw query but RawQuerySet lacks a lot of features which I really need (filters, count() and clone() methods, chainability).
The idea looks simple. QuerySet lets me to perform this query:
SELECT "table"."field1", ... ,"table"."fieldN" FROM "table"
whereas I need to do this:
SELECT "table"."field1", ... ,"table"."fieldN" FROM proxy(param1, param2)
So the question is: How can I do this? I've already started to create custom manager but can't substitute model.db_table with custom string (because it's being quoted and database stops to recognize the function call).
If there's no way to substitute table with function I would like to know if I can create QuerySet from RawQuerySet (Not the cleanest solution but simple RawQuerySet brings so much pain in... one body part).
If your requirement is that you want to use Raw SQL for efficiency purposes and still have access to model methods, what you really need is a library to map what fields of the SQL are the columns of what models.
And that library is, Unjoinify