django bulk update field with incrementing integer - django

Is it possible to bulk update one field of a queryset with an incrementing integer (not id)? Like queryset.update(serial_no=i) where i=1,2,3...
Django version = 1.11

I don't know if this will actually work as I haven't tried it, but you should give it a go.
from django.db.models import F
from django.db.models.functions import RowNumber
from django.db.models.expressions import Window
queryset.annotate(
row_number=Window(
expression=RowNumber()
order_by=F('ORDER_FIELD').asc(), # This needs to be set explicitly
)
).update(serial_no=F('row_number'))
What this should do is select the row number which will be 1 for the first record, 2 for the second and so on. Then the update should use that value via the F expression to update serial_no. My only worry is that Django will break on attempting to do the Window annotation and update.
Please let me know if it works.

Related

How to create custom db model function in Django like `Greatest`

I have a scenario that, i want a greatest value with the field name. I can get greatest value using Greatest db function which django provides. but i am not able to get its field name. for example:
emps = Employee.objects.annotate(my_max_value=Greatest('date_time_field_1', 'date_time_field_1'))
for e in emps:
print(e.my_max_value)
here i will get the value using e.my_max_value but i am unable to find out the field name of that value
You have to annotate a Conditional Expression using Case() and When().
from django.db.models import F, Case, When
emps = Employee.objects.annotate(
greatest_field=Case(
When(datetime_field_1__gt=F("datetime_field_2"),
then="datetime_field_1"),
When(datetime_field_2__gt=F("datetime_field_1"),
then="datetime_field_2"),
default="equal",
)
)
for e in emps:
print(e.greatest_field)
If you want the database query to tell you which of the fields was larger, you'll need to add another annotated column, using case/when logic to return one field name or the other. (See https://docs.djangoproject.com/en/4.0/ref/models/conditional-expressions/#when)
Unless you're really trying to offload work onto the database, it'll be much simpler to do the comparison work in Python.

Alternative nullif in Django ORM

Use Postgres as db and Django 1.9
I have some model with field 'price'. 'Price' blank=True.
On ListView, I get query set. Next, I want to sort by price with price=0 at end.
How I can write in SQL it:
'ORDER BY NULLIF('price', 0) NULLS LAST'
How write it on Django ORM? Or on rawsql?
Ok. I found alternative. Write own NullIf with django func.
from django.db.models import Func
class NullIf(Func):
template = 'NULLIF(%(expressions)s, 0)'
And use it for queryset:
queryset.annotate(new_price=NullIf('price')).order_by('new_price')
Edit : Django 2.2 and above have this implemented out of the box. The equivalent code will be
from django.db.models.functions import NullIf
from django.db.models import Value
queryset.annotate(new_price=NullIf('price', Value(0)).order_by('new_price')
You can still ORDER BY PRICE NULLS LAST if in your select you select the price as SELECT NULLIF('price', 0). That way you get the ordering you want, but the data is returned in the way you want. In django ORM you would select the price with annotate eg TableName.objects.annotate(price=NullIf('price', 0) and for the order by NULLS LAST and for the order by I'd follow the recommendations here Django: Adding "NULLS LAST" to query
Otherwise you could also ORDER BY NULLIF('price', 0) DESC but that will reorder the other numeric values. You can also obviously exclude null prices from the query entirely if you don't require them.

How can I make a Django update with a conditional case?

I would like to use Django to update a field to a different value depending on its current value, but I haven't figured out how to do it without doing 2 separate update statements.
Here's an example of what I'd like to do:
now = timezone.now()
data = MyData.objects.get(pk=dataID)
if data.targetTime < now:
data.targetTime = now + timedelta(days=XX)
else:
data.targetTime = data.targetTime + timedelta(days=XX)
data.save()
Now, I'd like to use an update() statement to avoid overwriting other fields on my data, but I don't know how to do it in a single update(). I tried some code like this, but the second update didn't use the up to date time (I ended up with a field equal to the current time) :
# Update the time to the current time
now = timezone.now()
MyData.objects.filter(pk=dataID).filter(targetTime__lt=now).update(targetTime=now)
# Then add the additional time
MyData.objects.filter(pk=dataID).update(targetTime=F('targetTime') + timedelta(days=XX))
Is there a way I can reduce this to a single update() statement? Something similar to the SQL CASE statement?
You need to use conditional expressions, like this
from django.db.models import Case, When, F
object = MyData.objects.get(pk=dataID)
now = timezone.now()
object.targetTime = Case(
When(targetTime__lt=now, then=now + timedelta(days=XX)),
default=F('targetTime') + timedelta(days=XX)
)
object.save(update_fields=['targetTime'])
For debugging, try running this right after save to see what SQL queries have just run:
import pprint
from django.db import connection
pprint.pprint(["queries", connection.queries])
I've tested this with integers and it works in Django 1.8, I haven't tried dates yet so it might need some tweaking.
Django 1.9 added the Greatest and Least database functions. This is an adaptation of Benjamin Toueg's answer:
from django.db.models import F
from django.db.models.functions import Greatest
MyData.objects.filter(pk=dataID).update(
targetTime=Greatest(F('targetTime'), timezone.now()) + timedelta(days=XX)
)
Simple Example for Django 3 and above:
from django.db.models import Case, Value, When, F
MyModel.objects.filter(abc__id=abc_id_list)\
.update(status=Case(
When(xyz__isnull=False, then=Value("this_value")),
default=Value("default_value"),))
If I understand correctly, you take the maximum time between now and the value in database.
If that is so, you can do it in one line with the max function:
from django.db.models import F
MyData.objects.filter(pk=dataID).update(targetTime=max(F('targetTime'),timezone.now()) + timedelta(days=XX))
Instead of using queryset.update(...), use obj.save(update_fields=['field_one', 'field_two']) (see https://docs.djangoproject.com/en/dev/ref/models/instances/#specifying-which-fields-to-save), which won't overwrite your existing fields.
It's not possible to do this without a select query first (get), because you're doing two different things based on a conditional (i.e., you can't pass that kind of logic to the database with Django - there are limits to what can be achieved with F), but at least this gets you a single insert/update.
I have figured out how to do it with a raw SQL statement:
cursor = connection.cursor()
cursor.execute("UPDATE `mydatabase_name` SET `targetTime` = CASE WHEN `targetTime` < %s THEN %s ELSE (`targetTime` + %s) END WHERE `dataID` = %s", [timezone.now(), timezone.now() + timedelta(days=XX), timedelta(days=XX), dataID])
transaction.commit_unless_managed()
I'm using this for now and it seems to be accomplishing what I want.

Django Query Related Field Count

I've got an app where users create pages. I want to run a simple DB query that returns how many users have created more than 2 pages.
This is essentially what I want to do, but of course it's not the right method:
User.objects.select_related('page__gte=2').count()
What am I missing?
You should use aggregates.
from django.db.models import Count
User.objects.annotate(page_count=Count('page')).filter(page_count__gte=2).count()
In my case, I didn't use last .count() like the other answer and it also works nice.
from django.db.models import Count
User.objects.annotate( our_param=Count("all_comments")).filter(our_param__gt=12)
use aggregate() function with django.db.models methods!
this is so useful and not really crushing with other annotation aggregated columns.
*use aggregate() at the last step of calculation, it turns your queryset to dict.
below is my code snippet using them.
cnt = q.values("person__year_of_birth").filter(person__year_of_birth__lte=year_interval_10)\
.filter(person__year_of_birth__gt=year_interval_10-10)\
.annotate(group_cnt=Count("visit_occurrence_id")).aggregate(Sum("group_cnt"))

Django: order by position ignoring NULL

I have a problem with Django queryset ordering.
My model contains a field named position, a PositiveSmallIntegerField which I'd like to used to order query results.
I use order_by('position'), which works great.
Problem : my position field is nullable (null=True, blank=True), because I don't wan't to specify a position for every 50000 instances of my model. When some instances have a NULL position, order_by returns them in the top of the list: I'd like them to be at the end.
In raw SQL, I used to write things like:
IF(position IS NULL or position='', 1, 0)
(see http://www.shawnolson.net/a/730/mysql-sort-order-with-null.html). Is it possible to get the same result using Django, without writing raw SQL?
You can use the annotate() from django agrregation to do the trick:
items = Item.objects.all().annotate(null_position=Count('position')).order_by('-null_position', 'position')
As of Django 1.8 you can use Coalesce() to convert NULL to 0.
Sample:
import datetime
from django.db.models.functions import Coalesce, Value
from app import models
# Coalesce works by taking the first non-null value. So we give it
# a date far before any non-null values of last_active. Then it will
# naturally sort behind instances of Box with a non-null last_active value.
the_past = datetime.datetime.now() - datetime.timedelta(days=10*365)
boxes = models.Box.objects.all().annotate(
new_last_active=Coalesce(
'last_active', Value(the_past)
)
).order_by('-new_last_active')
It's a shame there are a lot of questions like this on SO that are not marked as duplicate. See (for example) this answer for the native solution for Django 1.11 and newer. Here is a short excerpt:
Added the nulls_first and nulls_last parameters to Expression.asc() and desc() to control the ordering of null values.
Example usage (from comment to that answer):
from django.db.models import F
MyModel.objects.all().order_by(F('price').desc(nulls_last=True))
Credit goes to the original answer author and commenter.
Using extra() as Ignacio said optimizes a lot the end query. In my aplication I've saved more than 500ms (that's a lot for a query) in database processing using extra() instead of annotate()
Here is how it would look like in your case:
items = Item.objects.all().extra(
'select': {
'null_position': 'CASE WHEN {tablename}.position IS NULL THEN 0 ELSE 1 END'
}
).order_by('-null_position', 'position')
{tablename} should be something like {Item's app}_item following django's default tables name.
I found that the syntax in Pablo's answer needed to be updated to the following on my 1.7.1 install:
items = Item.objects.all().extra(select={'null_position': 'CASE WHEN {name of Item's table}.position IS NULL THEN 0 ELSE 1 END'}).order_by('-null_position', 'position')
QuerySet.extra() can be used to inject expressions into the query and order by them.