Django Sum & Count - django

I have some MySQL code that looks like this:
SELECT
visitor AS team,
COUNT(*) AS rg,
SUM(vscore>hscore) AS rw,
SUM(vscore<hscore) AS rl
FROM `gamelog` WHERE status='Final'
AND date(start_et) BETWEEN %s AND %s GROUP BY visitor
I'm trying to translate this into a Django version of that query, without making multiple queries. Is this possible? I read up on how to do Sum(), and Count(), but it doesn't seem to work when I want to compare two fields like I'm doing.
Here's the best I could come up with so far, but it didn't work...
vrecord = GameLog.objects.filter(start_et__range=[start,end],visitor=i['id']
).aggregate(
Sum('vscore'>'hscore'),
Count('vscore'>'hscore'))
I also tried using 'vscore>hscore' in there, but that didn't work either. Any ideas? I need to use as few queries as possible.

Aggregation only works on single fields in the Django ORM. I looked at the code for the various aggregation functions, and noticed that the single-field restriction is hardwired. Basically, when you use, say, Sum(field), it just records that for later, then it passes it to the database-specific backend for conversion to SQL and execution. Apparently, aggregation and annotation are not standardized in SQL.
Anyway, you probably need to use a raw SQL query.

Related

Wrapping a Django query set in a raw query or vice versa?

I'm thinking of using a raw query to quickly get around limitations with either my brain or the Django ORM, but I don't want to redevelop the infrastructure required to support the existing ORM code such as filters. Right now I'm stuck with two dead ends:
Writing an inner raw query and reusing that like any other query set. Even though my raw query selects the correct columns, I can't filter on it:
AttributeError: 'RawQuerySet' object has no attribute 'filter'
This is corroborated by another answer, but I'm still hoping that that information is out of date.
Getting the SQL and parameters from the query set and wrapping that in a raw query. It seems the raw SQL should be retrievable using queryset.query.get_compiler(DEFAULT_DB_ALIAS).as_sql() - how would I get the parameters as well (obviously without actually running the query)?
One option for dealing with complex queries is to write a VIEW that encapsulates the query, and then stick a model in front of that. You will still be able to filter (and depending upon your view, you may even get push-down of parameters to improve query performance).
All you need to do to get a model that is backed by a view is have it as "unmanaged", and then have the view created by a migration operation.
It's better to try to write a QuerySet if you can, but at times it is not possible (because you are using something that cannot be expressed using the ORM, for instance, or you need to to something like a LATERAL JOIN).

Convert Django ORM query with large IN clause to table value constructor

I have a bit of Django code that builds a relatively complicated query in a programmatic fashion, with various filters getting applied to an initial dataset through a series of filter and exclude calls:
for filter in filters:
if filter['name'] == 'revenue':
accounts = accounts.filter(account_revenue__in: filter['values'])
if filter['name'] == 'some_other_type':
if filter['type'] == 'inclusion':
accounts = accounts.filter(account__some_relation__in: filter['values'])
if filter['type'] == 'exclusion':
accounts = accounts.exclude(account__some_relation__in: filter['values'])
...etc
return accounts
For most of these conditions, the possible values of the filters are relatively small and contained, so the IN clauses that Django's ORM generates are performant enough. However there are a few cases where the IN clauses can be much larger (10K - 100K items).
In plain postgres I can make this query much more optimal by using a table value constructor, e.g.:
SELECT domain
FROM accounts
INNER JOIN (
VALUES ('somedomain.com'), ('anotherdomain.com'), ...etc 10K more times
) VALS(v) ON accounts.domain=v
With a 30K+ IN clause in the original query it can take 60+ seconds to run, while the table value version of the query takes 1 second, a huge difference.
But I cannot figure out how to get Django ORM to build the query like I want, and because of the way all the other filters are constructed from ORM filters I can't really write the entire thing as raw SQL.
I was thinking I could get the raw SQL that Django's ORM is going to run, regexp parse it, but that seems very brittle (and surprisingly difficult to get the actual SQL that is about to be run, because of parameter handling etc). I don't see how I could annotate with RawSQL since I don't want to add a column to select, but instead want to add a join condition. Is there a simple solution I am missing?

Django: how to filter() after distinct()

If we chain a call to filter() after a call to distinct(), the filter is applied to the query before the distinct. How do I filter the results of a query after applying distinct?
Example.objects.order_by('a','foreignkey__b').distinct('a').filter(foreignkey__b='something')
The where clause in the SQL resulting from filter() means the filter is applied to the query before the distinct. I want to filter the queryset resulting from the distinct.
This is probably pretty easy, but I just can't quite figure it out and I can't find anything on it.
Edit 1:
I need to do this in the ORM...
SELECT z.column1, z.column2, z.column3
FROM (
SELECT DISTINCT ON (b.column1, b.column2) b.column1, b.column2, c.column3
FROM table1 a
INNER JOIN table2 b ON ( a.id = b.id )
INNER JOIN table3 c ON ( b.id = c.id)
ORDER BY b.column1 ASC, b.column2 ASC, c.column4 DESC
) z
WHERE z.column3 = 'Something';
(I am using Postgres by the way.)
So I guess what I am asking is "How do you nest subqueries in the ORM? Is it possible?" I will check the documentation.
Sorry if I was not specific earlier. It wasn't clear in my head.
This is an old question, but when using Postgres you can do the following to force nested queries on your 'Distinct' rows:
foo = Example.objects.order_by('a','foreign_key__timefield').distinct('a')
bar = Example.objects.filter(pk__in=foo).filter(some_field=condition)
bar is the nested query as requested in OP without resorting to raw/extra etc. Tested working in 1.10 but docs suggest it should work back to at least 1.7.
My use case was to filter up a reverse relationship. If Example has some ForeignKey to model Toast then you can do:
Toast.objects.filter(pk__in=bar.values_list('foreign_key',flat=true))
This gives you all instances of Toast where the most recent associated example meets your filter criteria.
Big health warning about performance though, using this if bar is likely to be a huge queryset you're probably going to have a bad time.
Thanks a ton for the help guys. I tried both suggestions and could not bend either of those suggestions to work, but I think it started me in the right direction.
I ended up using
from django.db.models import Max, F
Example.objects.annotate(latest=Max('foreignkey__timefield')).filter(foreignkey__timefield=F('latest'), foreign__a='Something')
This checks what the latest foreignkey__timefield is for each Example, and if it is the latest one and a=something then keep it. If it is not the latest or a!=something for each Example then it is filtered out.
This does not nest subqueries but it gives me the output I am looking for - and it is fairly simple. If there is simpler way I would really like to know.
No you can't do this in one simple SELECT.
As you said in comments, in Django ORM filter is mapped to SQL clause WHERE, and distinct mapped to DISTINCT. And in a SQL, DISTINCT always happens after WHERE by operating on the result set, see SQLite doc for example.
But you could write sub-query to nest SELECTs, this depends on the actual target (I don't know exactly what's yours now..could you elaborate it more?)
Also, for your query, distinct('a') only keeps the first occurrence of Example having the same a, is that what you want?

Are chained QuerySet filters equivalent to defining multiple fields in a single filter with the Django ORM?

When filtering a queryset, I'm wondering if the following are equivalent.
User.objects.filter(username='josh').filter(email__startswith='josh')
User.objects.filter(username='josh', email__startswith='josh')
I can't imagine how the generated SQL could be any different between the two. The documentation doesn't seem to mention any differences either.
You can execute those queries in the shell and print out the generated SQL like:
>>> print User.objects.filter(username='josh').filter(email__startswith='josh').query
I tested a similiar queries like you got here and there was no difference in the generated SQL code.Both statements end up using them same WHERE Statement.
Furhtermore it shouldnt make any difference in this case whether you chain the filters or apply them in one step.
But there are scenarios in which the order of filtering matters.
Have a look here and here.
Django QuerySets are lazy, running:
User.objects.filter(username='josh').filter(email__startswith='josh')
or even
a = User.objects.filter(username='josh')
a = a.filter(email__startswith='josh')
produces only a single db query, that is performed when you try to access your data. Such query agreegates all filters and excludes in the where clause.

Django QuerySet way to select from sql table-function

everybody.
I work with Django 1.3 and Postgres 9.0. I have very complex sql query which extends simple model table lookup with some extra fields. And it wrapped in table function since it is parameterized.
A month before I managed to make it work with the help of raw query but RawQuerySet lacks a lot of features which I really need (filters, count() and clone() methods, chainability).
The idea looks simple. QuerySet lets me to perform this query:
SELECT "table"."field1", ... ,"table"."fieldN" FROM "table"
whereas I need to do this:
SELECT "table"."field1", ... ,"table"."fieldN" FROM proxy(param1, param2)
So the question is: How can I do this? I've already started to create custom manager but can't substitute model.db_table with custom string (because it's being quoted and database stops to recognize the function call).
If there's no way to substitute table with function I would like to know if I can create QuerySet from RawQuerySet (Not the cleanest solution but simple RawQuerySet brings so much pain in... one body part).
If your requirement is that you want to use Raw SQL for efficiency purposes and still have access to model methods, what you really need is a library to map what fields of the SQL are the columns of what models.
And that library is, Unjoinify