How to query tuples of columns in Django database queries? - django

I have some table ports(switch_ip, slot_number, port_number, many, more, columns) and would like to achieve the following PostgreSQL query using Django:
SELECT switch_ip, array_agg((slot_number, port_number, many, more, columns) ORDER BY slot_number, port_number) info
FROM ports
GROUP BY switch_ip
ORDER BY switch_ip
Using django.contrib.postgres.aggregates here's what I got so far:
Port.objects \
.values('switch_ip') \
.annotate(
info=ArrayAgg('slot_number', ordering=('slot_number', 'port_number'))
) \
.order_by('switch_ip')
I am unable to include more than one column in the ArrayAgg. None of ArrayAgg(a, b, c), ArrayAgg((a, b, c)), ArrayAgg([a, b, c]) seem to work. A workaround could involve separate ArrayAggs for each column and each with the same ordering. I would despise this because I have many columns. Is there any nicer workaround, possibly more low-level?
I suspect this is no issue with ArrayAgg itself but rather with tuple expressions in general. Is there any way to have tuples at all in Django queries? For example, what would be the corresponding Django of:
SELECT switch_ip, (slot_number, port_number, many, more, columns) info
FROM ports
If this is not yet possible in Django, how feasible would it be to implement?

I have spent lot of time searching for a working solution and here is a full recipe with code example.
You need to define Array "function" with square brackets in template
from django.db.models.expressions import Func
class Array(Func):
template = '%(function)s[%(expressions)s]'
function = 'ARRAY'
You need to define output field format (it must be array of some django field). For example an array of strings
from django.contrib.postgres.fields import ArrayField
from django.db.models.fields import CharField
out_format = ArrayField(CharField(max_length=200))
Finally make an ArrayAgg expression
from django.db.models import F
annotate = {'2-fields': ArrayAgg(Array(F('field1'), F('field2'), output_field=out_format), distinct=True) }
model.objects.all().annotate(**annotate)
(Optional) If field1 or field2 are not CharFields, you may include Cast as an argument of Array
from django.db.models.functions import Cast
annotate = {'2-fields': ArrayAgg(Array(Cast(F('field1'), output_field=CharField(max_length=200)), F('field2'), output_field=out_format), distinct=True) }

Having done a bit more research I guess one could add the missing tuple functionality as follows:
Create a new model field type named TupleField. The implementation might look kind of similar to django.contrib.postgres.fields.ArrayField. TupleField would be rather awkward because I don't think any RDBMS allows for composite types to be used as column types so usage of TupleField would be limited to (possibly intermediate?) query results.
Create a new subclass of django.db.models.Expression which wraps multiple expressions on its own (like Func in general, so looking at Func's implementation might be worthwile) and evaluates to a TupleField. Name this subclass TupleExpression for example.
Then I could simply annotate with ArrayAgg(TupleExpression('slot_number', 'port_number', 'many', 'more', 'columns'), ordering=('slot_number', 'port_number')) to solve my original problem. This would annotate each switch_ip with correctly-ordered arrays of tuples where each tuple represents one switch port.

Related

Django ORM: how do I apply a function over an aggregate result?

I want to do
SELECT [field1], ST_Area(ST_Union(geometry), True) FROM table [group by field1]
Or, written in another words, how do I apply a function over an aggregate result? ST_Union is an aggregate. [field1] is just free notation to say I'd like to run both queries with or without this group by.
Also, ST_Area with 2 arguments seem not to be available on django gis helpers, so it must probably be written using Func.
Also, I want to be able to also aggregate by everything (not provide a groupBy) but django seems to add a group by id if I don't provide any .values() to the queryset.
This seems very confusing. I can't get my head around annotates and aggregates. Thank you!
Apparently I can normally chain aggregates, like
from django.contrib.gis.db.models import Union, GeometryField
from django.contrib.gis.db.models.functions import Transform, Area
qs = qs.annotate(area_total=Area(Transform(Union("geometry"), 98056)))
The issue I was encountering was that I was attemping to use Func() expressions. In order to chain another function in the 1st parameter of Func, it must be wrapped with ExpressionWrapper or something else.
qs = qs.annotate(
area_total=Func(
ExpressionWrapper(Union("geometry"), output_field=GeometryField()),
True,
function="ST_Area",
output_field=FloatField(),
)
)

How to get boolean result in annotate django?

I have a filter which should return a queryset with 2 objects, and should have one different field. for example:
obj_1 = (name='John', age='23', is_fielder=True)
obj_2 = (name='John', age='23', is_fielder=False)
Both the objects are of same model, but different primary key. I tried usign the below filter:
qs = Model.objects.filter(name='John', age='23').annotate(is_fielder=F('plays__outdoor_game_role')=='Fielder')
I used annotate first time, but it gave me the below error:
TypeError: QuerySet.annotate() received non-expression(s): False.
I am new to Django, so what am I doing wrong, and what should be the annotate to get the required objects as shown above?
The solution by #ktowen works well, quite straightforward.
Here is another solution I am using, hope it is helpful too.
queryset = queryset.annotate(is_fielder=ExpressionWrapper(
Q(plays__outdoor_game_role='Fielder'),
output_field=BooleanField(),
),)
Here are some explanations for those who are not familiar with Django ORM:
Annotate make a new column/field on the fly, in this case, is_fielder. This means you do not have a field named is_fielder in your model while you can use it like plays.outdor_game_role.is_fielder after you add this 'annotation'. Annotate is extremely useful and flexible, can be combined with almost every other expression, should be a MUST-KNOWN method in Django ORM.
ExpressionWrapper basically gives you space to wrap a more complecated combination of conditions, use in a format like ExpressionWrapper(expression, output_field). It is useful when you are combining different types of fields or want to specify an output type since Django cannot tell automatically.
Q object is a frequently used expression to specify a condition, I think the most powerful part is that it is possible to chain the conditions:
AND (&): filter(Q(condition1) & Q(condition2))
OR (|): filter(Q(condition1) | Q(condition2))
Negative(~): filter(~Q(condition))
It is possible to use Q with normal conditions like below:
(Q(condition1)|id__in=[list])
The point is Q object must come to the first or it will not work.
Case When(then) can be simply explained as if con1 elif con2 elif con3 .... It is quite powerful and personally, I love to use this to customize an ordering object for a queryset.
For example, you need to return a queryset of watch history items, and those must be in an order of watching by the user. You can do it with for loop to keep the order but this will generate plenty of similar queries. A more elegant way with Case When would be:
item_ids = [list]
ordering = Case(*[When(pk=pk, then=pos)
for pos, pk in enumerate(item_ids)])
watch_history = Item.objects.filter(id__in=item_ids)\
.order_by(ordering)
As you can see, by using Case When(then) it is possible to bind those very concrete relations, which could be considered as 1) a pinpoint/precise condition expression and 2) especially useful in a sequential multiple conditions case.
You can use Case/When with annotate
from django.db.models import Case, BooleanField, Value, When
Model.objects.filter(name='John', age='23').annotate(
is_fielder=Case(
When(plays__outdoor_game_role='Fielder', then=Value(True)),
default=Value(False),
output_field=BooleanField(),
),
)

Annotation with a subquery with multiple result in Django

I use postgresql database in my project and I use below example from django documentation.
from django.db.models import OuterRef, Subquery
newest = Comment.objects.filter(post=OuterRef('pk')).order_by('-created_at')
Post.objects.annotate(newest_commenter_email=Subquery(newest.values('email')[:1]))
but instead of newest commenter email, i need last two commenters emails. i changed [:1] to [:2] but this exception raised: ProgrammingError: more than one row returned by a subquery used as an expression.
You'll need to aggregate the subquery results in some way: perhaps by using an ARRAY() construct.
You can create a subclass of Subquery to do this:
class Array(Subquery):
template = 'ARRAY(%(subquery)s)`
output_field = ArrayField(base_field=models.TextField())
(You can do a more automatic method of getting the output field, but this should work for you for now: see https://schinckel.net/2019/07/30/subquery-and-subclasses/ for more details).
Then you can use:
posts = Post.objects.annotate(
newest_commenters=Array(newest.values('email')[:2]),
)
The reason this is happening is because a correlated subquery in postgres may only return one row, with one column. You can use this mechanism to deal with multiple rows, and perhaps use JSONB construction if you need multiple columns.

Django Array contains a field

I am using Django, with mongoengine. I have a model Classes with an inscriptions list, And I want to get the docs that have an id in that list.
classes = Classes.objects.filter(inscriptions__contains=request.data['inscription'])
Here's a general explanation of querying ArrayField membership:
Per the Django ArrayField docs, the __contains operator checks if a provided array is a subset of the values in the ArrayField.
So, to filter on whether an ArrayField contains the value "foo", you pass in a length 1 array containing the value you're looking for, like this:
# matches rows where myarrayfield is something like ['foo','bar']
Customer.objects.filter(myarrayfield__contains=['foo'])
The Django ORM produces the #> postgres operator, as you can see by printing the query:
print Customer.objects.filter(myarrayfield__contains=['foo']).only('pk').query
>>> SELECT "website_customer"."id" FROM "website_customer" WHERE "website_customer"."myarrayfield_" #> ['foo']::varchar(100)[]
If you provide something other than an array, you'll get a cryptic error like DataError: malformed array literal: "foo" DETAIL: Array value must start with "{" or dimension information.
Perhaps I'm missing something...but it seems that you should be using .filter():
classes = Classes.objects.filter(inscriptions__contains=request.data['inscription'])
This answer is in reference to your comment for rnevius answer
In Django ORM whenever you make a Database call using ORM, it will generally return either a QuerySet or an object of the model if using get() / number if you are using count() ect., depending on the functions that you are using which return other than a queryset.
The result from a Queryset function can be used to implement further more refinement, like if you like to perform a order() or collecting only distinct() etc. Queryset are lazy which means it only hits the database when they are actually used not when they are assigned. You can find more information about them here.
Where as the functions that doesn't return queryset cannot implement such things.
Take time and go through the Queryset Documentation more in depth explanation with examples are provided. It is useful to understand the behavior to make your application more efficient.

Django Aggregation - Expression contains mixed types. You must set output_field

I'm trying to achive an Aggregation Query and that's my code:
TicketGroup.objects.filter(event=event).aggregate(
total_group=Sum(F('total_sold')*F('final_price')))
I have 'total_sold' and 'final_price' in TicketGroup object and all what I want to do is sum and multiply values to get the total sold of all TicketGroups together.
All I get is this error:
Expression contains mixed types. You must set output_field
What I am doing wrong, since I'm calling 'total_group' as my output field?
Thanks!
By output_field Django means to provide field type for the result of the Sum.
from django.db.models import FloatField, F
total_group=Sum(F('total_sold')*F('final_price'), output_field=FloatField())
should do the trick.
I had to use something different in order to make my query work. Just output_field wont solve it. I needed a simple division between two aliases. These are output of two annotations.
from django.db.models import FloatField, ExpressionWrapper, F
distinct_people_with_more_than_zero_bill = Task.objects.filter(
billable_efforts__gt=0).values('report__title').annotate(
Count('assignee', distinct=True)).annotate(
Sum('billable_efforts'))
annotate(yy=ExpressionWrapper(F('billable_efforts__sum') / F('assignee__count'), output_field=FloatField()))
The key here is ExpressionWrapper.
Without this, you will get an error: received non-expression(s)
The hint came for Django documentation itself, which says:
If the fields that you’re combining are of different types you’ll need
to tell Django what kind of field will be returned. Since F() does not
directly support output_field you will need to wrap the expression
with ExpressionWrapper
Link: https://docs.djangoproject.com/en/2.2/ref/models/expressions/