Django add function to "from" clause? - django

I'm trying to write a Django query that generates the following:
select group_id, cardinality(array_agg(distinct aul))
from report_table, unnest(active_user_list) as aul
group by group_id
Where active_user_list is an array[int] type.
I'm trying to get a count of the unique items in the arrays of all rows that are in a group. The queryset.extra method gets me very close to this, but adds double quotes around unnest(active_user_list) as aul and doesn't work. I created a custom sql function that does work, but I'd prefer to do it in Django if possible.

Related

How to create custom db model function in Django like `Greatest`

I have a scenario that, i want a greatest value with the field name. I can get greatest value using Greatest db function which django provides. but i am not able to get its field name. for example:
emps = Employee.objects.annotate(my_max_value=Greatest('date_time_field_1', 'date_time_field_1'))
for e in emps:
print(e.my_max_value)
here i will get the value using e.my_max_value but i am unable to find out the field name of that value
You have to annotate a Conditional Expression using Case() and When().
from django.db.models import F, Case, When
emps = Employee.objects.annotate(
greatest_field=Case(
When(datetime_field_1__gt=F("datetime_field_2"),
then="datetime_field_1"),
When(datetime_field_2__gt=F("datetime_field_1"),
then="datetime_field_2"),
default="equal",
)
)
for e in emps:
print(e.greatest_field)
If you want the database query to tell you which of the fields was larger, you'll need to add another annotated column, using case/when logic to return one field name or the other. (See https://docs.djangoproject.com/en/4.0/ref/models/conditional-expressions/#when)
Unless you're really trying to offload work onto the database, it'll be much simpler to do the comparison work in Python.

Selecting all fields and grouping by by one

I want to write a query like SELECT * FROM users GROUP BY some_attribute. How can I do that using Django ORM?
User.objects.all().values('some_attribute').annotate(count=Count('*'))
doesn't work, because it just selects some_attribute, instead of * - all.
I need it using the ORM, I don't want to write raw statement.

django valueslist queryset across database engines

In one of the django apps we use two database engine A and B, both are the same database but with different schemas. We have a table called C in both schemas but using db routing it's always made to point to database B. We have formed a valuelist queryset from one of the models in A, tried to pass the same in table C using filter condition __in but it always fetches empty though there are matching records. When we convert valueslist queryset to a list and use it in table C using filter condition __in it works fine.
Not working
data = modelindbA.objects.values_list('somecolumn',flat=True)
info = C.objects.filter(somecolumn__in=data).values_list
Working
data = modelindbA.objects.values_list('somecolumn',flat=True)
data = list(data)
info = C.objects.filter(somecolumn__in=data).values_list
I have read django docs and other SO questions, couldn't find anything relative. My guess is that since both models are in different database schemas the above is not working. I need assistance on how to troubleshoot this issue.
When you use a queryset with __in, Django will construct a single SQL query that uses a subquery for the __in clause. Since the two tables are in different databases, no rows will match.
By contrast, if you convert the first queryset to a list, Django will go ahead and fetch the data from the first database. When you then pass that data to the second query, hitting the second database, it will work as expected.
See the documentation for the in field lookup for more details:
You can also use a queryset to dynamically evaluate the list of values instead of providing a list of literal values.... This queryset will be evaluated as subselect statement:
SELECT ... WHERE blog.id IN (SELECT id FROM ... WHERE NAME LIKE '%Cheddar%')
Because values_list method returns django.db.models.query.QuerySet, not a list.
When you use it with same schema the orm optimise it and should make just one query, but when schemas are different it fails.
Just use list().
I would even recommend to use it for one schema since it can decrease complexity of query and work better on big tables.

Django: Count of Group Elements

How can we achieve the following via the Django 1.5 ORM:
SELECT TO_CHAR(date, 'IW/YYYY') week_year, COUNT(*) FROM entries GROUP BY week_year;
EDIT: cf. Follow up: Count of Group Elements With Joins in Django in case you need a join.
I had to do something like this recently.
You need to add your week_year column via Django's extra, then you can use that column in the values method.
...it's not obvious but if you then use annotate Django will GROUP BY all of the fields mentioned in the values clause (as described in the docs here https://docs.djangoproject.com/en/dev/topics/db/aggregation/#values)
So your code should look like:
Entry.objects.extra(select={'week_year': "TO_CHAR(date, 'IW/YYYY')"}).values('week_year').annotate(Count('id'))

how can I group on converted values?

so far I have this query:
q = Foobar.objects.values('updater','updated')
q = q.annotate(update_count=Count("id"))
which seems to generate a query like:
select updater, updated, count(id)
from foobar
group by updater, updated
"updated" is a date-time field, and I'd like to do my counts by day, with a query that looks like:
select updater, cast(updated as date), count(id)
from foobar
group by updater, cast(updated as date)
is there a way to do this with the Query API, or do I have to drop back to raw SQL?
Django doesn't support this level of control over database queries - generally, you can't make queries use functions like CAST.
You have a few options in this case, though. First of all, most simply, you can just take the datetime object returned by the ORM object and remove the extra precision using datetime.replace().
Another option, if you know that you'll never want your Django app to use any precision in the updated field beyond the day, is to simply define updated in your models.py as a models.DateField() as opposed to models.DateTimeField(). This means data returned by the ORM Model will never have precision beyond the day.
Finally, I assume you're using the most recent Django (1.1), but in Django 1.2 (scheduled for May 10), you'll be able to do the following:
Foobar.objects.raw("select updater, cast(updated as date), count(id) from foobar group by updater, cast(updated as date)")
The result (assuming it has the same number of columns and column types as what you've defined in your Foobar model) will be a normal django ORM Queryset.
QuerySet.dates() ( http://docs.djangoproject.com/en/dev/ref/models/querysets/#dates-field-kind-order-asc ) takes you part of the way there, but it doesn't seem to play nice with .values() and make a GROUP BY (I tried for a few minutes). Maybe you've already seen that anyway...
But it does show that Django already has an SQL function that you'll need if you write your own SQL version:
print(ProcessingState.objects.dates('timestamp', 'day').query)
yields
SELECT DISTINCT django_date_trunc("day", "databot_processingstate"."timestamp")
FROM "databot_processingstate" ORDER BY 1 ASC
(sorry for the weird table names and stuff, it's just my own model I happend to have handy)