Django: Count of Group Elements - django

How can we achieve the following via the Django 1.5 ORM:
SELECT TO_CHAR(date, 'IW/YYYY') week_year, COUNT(*) FROM entries GROUP BY week_year;
EDIT: cf. Follow up: Count of Group Elements With Joins in Django in case you need a join.

I had to do something like this recently.
You need to add your week_year column via Django's extra, then you can use that column in the values method.
...it's not obvious but if you then use annotate Django will GROUP BY all of the fields mentioned in the values clause (as described in the docs here https://docs.djangoproject.com/en/dev/topics/db/aggregation/#values)
So your code should look like:
Entry.objects.extra(select={'week_year': "TO_CHAR(date, 'IW/YYYY')"}).values('week_year').annotate(Count('id'))

Related

How to write query of group by

I have model named IssueFlags with columns:
id, created_at, flags_id, issue_id, comments
I want to get data of unique issue_id (latest created) with info about flags_id, created_at and comments
By sql it's working like this:
SELECT created_at, flags_id, issue_id, comments
FROM Issues_issueflags
group by issue_id
How to do the same in Django? I tried to wrote sth in shell, but there is no attribute group by
IssueFlags.objects.order_by('-created_at')
This above return me only the list of ordered data.
Try doing this way:
from django.db.models import Count
IssueFlags.objects.values('created_at', 'flags_id', 'issue_id', 'comments').order_by('-created_at').annotate(total=Count('issue_id'))
I have written annotate(total=Count('issue_id')) assuming that you would have multiple entries of unique issue_id (Note that you can do all possible types of aggregations like Sum, Count, Max, Avg inside . Also, there already exists answers for, doing group by in django. Also have a look at this link or this link. Also, read this django documentation to get a clear idea on when to place values() before annotate() and when to place it after, and then implement the learning as per your requirement.
Would be happy to help if you have any further doubts.

django valueslist queryset across database engines

In one of the django apps we use two database engine A and B, both are the same database but with different schemas. We have a table called C in both schemas but using db routing it's always made to point to database B. We have formed a valuelist queryset from one of the models in A, tried to pass the same in table C using filter condition __in but it always fetches empty though there are matching records. When we convert valueslist queryset to a list and use it in table C using filter condition __in it works fine.
Not working
data = modelindbA.objects.values_list('somecolumn',flat=True)
info = C.objects.filter(somecolumn__in=data).values_list
Working
data = modelindbA.objects.values_list('somecolumn',flat=True)
data = list(data)
info = C.objects.filter(somecolumn__in=data).values_list
I have read django docs and other SO questions, couldn't find anything relative. My guess is that since both models are in different database schemas the above is not working. I need assistance on how to troubleshoot this issue.
When you use a queryset with __in, Django will construct a single SQL query that uses a subquery for the __in clause. Since the two tables are in different databases, no rows will match.
By contrast, if you convert the first queryset to a list, Django will go ahead and fetch the data from the first database. When you then pass that data to the second query, hitting the second database, it will work as expected.
See the documentation for the in field lookup for more details:
You can also use a queryset to dynamically evaluate the list of values instead of providing a list of literal values.... This queryset will be evaluated as subselect statement:
SELECT ... WHERE blog.id IN (SELECT id FROM ... WHERE NAME LIKE '%Cheddar%')
Because values_list method returns django.db.models.query.QuerySet, not a list.
When you use it with same schema the orm optimise it and should make just one query, but when schemas are different it fails.
Just use list().
I would even recommend to use it for one schema since it can decrease complexity of query and work better on big tables.

Hourly grouping of rows using Django

I have been trying to group the results of table into Hourly format using DateTimeField.
SQL:
SELECT strftime('%H', created_on), count(*)
FROM users_test
GROUP BY strftime('%H', created_on);
This query works fine, but the corresponding Django query does not.
Django queries I've tried:
Test.objects.extra({'hour': 'strftime("%%H", created_on)'}).values('hour').annotate(count=Count('id'))
# SELECT (strftime("%H", created_on)) AS "hour", COUNT("users_test"."id") AS "count" FROM "users_test" GROUP BY (strftime("%H", created_on)), "users_test"."created_on" ORDER BY "users_test"."created_on" DESC
It adds additional group by "users_test"."created_on", which I guess is giving incorrect results.
It would be great if anyone can explain me this and provide a solution as well.
Environment:
Python 3
Django 1.8.1
Thanks in Advance
References (Possible Duplicates) (But None helping out):
Grouping Django model entries by day using its datetime field
Django - Group By with Date part alone
Django aggregate on .extra values
To fix it, append order_by() to query chain. This will override model Meta default ordering. Like this:
Test
.objects
.extra({'hour': 'strftime("%%H", created_on)'})
.order_by() #<------ here
.values('hour')
.annotate(count=Count('id'))
In my environment ( Postgres also ):
>>> print ( Material
.objects
.extra({'hour': 'strftime("%%H", data_creacio)'})
.order_by()
.values('hour')
.annotate(count=Count('id'))
.query )
SELECT (strftime("%H", data_creacio)) AS "hour",
COUNT("material_material"."id") AS "count"
FROM "material_material"
GROUP BY (strftime("%H", data_creacio))
Learn more in order_by django docs:
If you don’t want any ordering to be applied to a query, not even the default ordering, call order_by() with no parameters.
Side note:
using extra() may introduce SQL injection vulnerability to your code. Use this with precaution and escape any parameters that user can introduce. Compare with docs:
Warning
You should be very careful whenever you use extra(). Every time you
use it, you should escape any parameters that the user can control by
using params in order to protect against SQL injection attacks .
Please read more about SQL injection protection.

Django order_by() filter with distinct()

How can I make an order_by like this ....
p = Product.objects.filter(vendornumber='403516006')\
.order_by('-created').distinct('vendor__name')
The problem is that I have multiple vendors with the same name, and I only want the latest product by the vendor ..
Hope it makes sense?
I got this DB error:
SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: SELECT DISTINCT ON ("search_vendor"."name")
"search_product"...
Based on your error message and this other question, it seems to me this would fix it:
p = Product.objects.filter(vendornumber='403516006')\
.order_by('vendor__name', '-created').distinct('vendor__name')
That is, it seems that the DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). So by making the column you use in distinct as the first column in the order_by, I think it should work.
Just matching leftmost order_by() arg and distinct() did not work for me, producing the same error (Django 1.8.7 bug or a feature)?
qs.order_by('project').distinct('project')
however it worked when I changed to:
qs.order_by('project__id').distinct('project')
and I do not even have multiple order_by args.
In case you are hoping to use a separate field for distinct and order by another field you can use the below code
from django.db.models import Subquery
Model.objects.filter(
pk__in=Subquery(
Model.objects.all().distinct('foo').values('pk')
)
).order_by('bar')
I had a similar issue but then with related fields. With just adding the related field in distinct(), I didn't get the right results.
I wanted to sort by room__name keeping the person (linked to residency ) unique. Repeating the related field as per the below fixed my issue:
.order_by('room__name', 'residency__person', ).distinct('room__name', 'residency__person')
See also these related posts:
ProgrammingError: when using order_by and distinct together in django
django distinct and order_by
Postgresql DISTINCT ON with different ORDER BY

Prevent multiple SQL querys with model relations

Is it possible to prevent multiple querys when i use django ORM ? Example:
product = Product.objects.get(name="Banana")
for provider in product.providers.all():
print provider.name
This code will make 2 SQL querys:
1 - SELECT ••• FROM stock_product WHERE stock_product.name = 'Banana'
2 - SELECT stock_provider.id, stock_provider.name FROM stock_provider INNER JOIN stock_product_reference ON (stock_provider.id = stock_product_reference.provider_id) WHERE stock_product_reference.product_id = 1
I confess, i use Doctrine (PHP) for some projects. With doctrine it's possible to specify joins when retrieve the object (relations are populated in object, so no need to query database again for get attribute relation value).
Is it possible to do the same with Django's ORM ?
PS: I hop my question is comprehensive, english is not my primary language.
In Django 1.4 or later, you can use prefetch_related. It's like select_related but allows M2M relations and such.
product = Product.objects.prefetch_related('providers').get(name="Banana")
You still get two queries, though. From the docs:
prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python.
As for packing this down into a single query, Django won't do it like Doctrine because it doesn't do that much post-processing of the result set (Django would have to remove all the redundant column data, since you'll get a row per provider and each of these rows will have a copy of all of product's fields).
So if you want to pack this down to one query, you're going to have to turn it around and run the query on the Provider table (I'm guessing at your schema):
providers = Provider.objects.filter(product__name="Banana").select_related('product')
This should pack it down to one query, but you won't get a single product ORM object out of it, instead needing to get the product fields via providers[k].product.
You can use prefetch_related, sometimes in combination with select_related, to get all related objects in a single query: https://docs.djangoproject.com/en/1.5/ref/models/querysets/#prefetch-related