To optimize a lot my database I would like to make as less as possible any query.
I'm trying to get an object, increment the field "count_limit" and make an If statement after on the Customer instance.
To achieve it I've made this query who worked well.
Customer.objects.filter(user=user).update(count_limit=F('count_limit') + 1)
So after this query, count_limit has been incremented by 1 as I wanted.
When I'm trying to get the Customer instance as a result of this query, it returns "1".
Is it possible to make both, update the instance and get it as a return object ?
Thanks a lot
The update() method will return the number of updated rows. If you are using Postgres, then you can use the returning clause with the raw query.
query = 'UPDATE customer SET count_limit=(customer.count_limit + 1) WHERE customer.user_id=%s returning *'
updated_obj = Customer.objects.raw(query, [user.id])
I don't know if this can be achieved by ORM, but suggestions will be appreciated.
Make sure that the table name in raw query is correct. If you haven't definer db_table in the meta class of your model, then by default it will be myapp_model.
And to prevent SQL injection, from the Docs:
Do not use string formatting on raw queries or quote placeholders in
your SQL strings!
Follow Docs on raw()
You are looking for F functions: https://docs.djangoproject.com/en/3.0/ref/models/expressions/#f-expressions
Example from their documentation how to increase a counter
from django.db.models import F
reporter = Reporters.objects.get(name='Tintin')
reporter.stories_filed = F('stories_filed') + 1
reporter.save()
Related
I want to modify some_date_field value just for filtering purpose.
Like using models.Lookup or models.Transform but I dont want to make a raw sql expression.
For instance, using a raw ms sql expression I could write:
WHERE CONVERT(date, FORMAT(some_date_field, '2021MMdd')) >= #some_var
But I how I can do that with Django?
class SomeModel(models.Model):
some_date_field = models.DateField()
def replace_year(value):
return value.replace(year=2021)
SomeModel.objects.filter(
# replace_year(some_date_field)__gte=some_var
)
Is it possible?
You can use SomeModel.objects.filter({whatever you want to filter}).update(some_date_field={date_value})
if you have any issues see:
https://docs.djangoproject.com/en/3.2/ref/models/querysets/#django.db.models.query.QuerySet.update
If you are trying to bulk update all of the objects returned by a queryset and you are using Django 2.2 or greater you can use 'bulk_update'.
See here: Django Bulk Update
If you are dynamically updating values based off of another field check out F expressions they can be used with an 'update' on querysets.
See here: Update dynamically with F expressions
Something to note though, this won't use ModelClass.save method (so if you have some logic inside it won't be triggered).
Take a look at these answers here as well
you can use filter() and update() methods in django
Assuming we need to filter some known year which is the old_date variable and the new value contains in the new_date variable
# defing mehod to filter and update new date
def update_date(old_date, new_date):
SomeModel.objects.filter(some_date_field=old_date).update(some_date_field=new_date)
return None
you can find some examples using this link.
Hope this will be helpful for you.
My Goal
I need PostgreSQL's rank() window function applied to an annotated queryset from Django's ORM. Django's sql query has to be a subquery in order to apply the window function and this is what I'm doing so far:
queryset = Item.objects.annotate(…)
queryset_with_rank = Items.objects.raw("""
select rank() over (order by points), *
from (%(subquery)s)""", { 'subquery': queryset.query }
)
The problem
Unfortunately, the query returned by queryset.query does not quote the parameters used for annotation correctly although the query itself is executed perfectly fine.
Example of returned query
The query returned by queryset_with_rank.query or queryset.query returns the following
"participation"."category" = )
"participation"."category" = amateur)
which I rather expected to be
"participation"."category" = '')
"participation"."category" = 'amateur')
Question
I noticed that the Django documentation states the following about Query.__str__()
Parameter values won't necessarily be quoted correctly, since that is done by the database interface at execution time.
As long as I fix the quotation manually and pass it to Postgres myself, everything works as expected. Is there a way to receive the needed subquery with correct quotation? Or is there an alternative and better approach to applying a window function to a Django ORM queryset altoghether?
As Django core developer Aymeric Augustin said, there's no way to get the exact query that is executed by the database backend beforehand.
I still managed to build the query the way I hoped to, although a bit cumbersome:
# Obtain query and parameters separately
query, params = item_queryset.query.sql_with_params()
# Put additional quotes around string. I guess this is what
# the database adapter does as well.
params = [
'\'{}\''.format(p)
if isinstance(p, basestring) else p
for p in params
]
# Cast list of parameters to tuple because I got
# "not enough format characters" otherwise. Dunno why.
params = tuple(params)
participations = Item.objects.raw("""
select *,
rank() over (order by points DESC) as rank
from ({subquery}
""".format(subquery=query.format(params)), []
)
I've always found the Django orm's handling of subclassing models to be pretty spiffy. That's probably why I run into problems like this one.
Take three models:
class A(models.Model):
field1 = models.CharField(max_length=255)
class B(A):
fk_field = models.ForeignKey('C')
class C(models.Model):
field2 = models.CharField(max_length=255)
So now you can query the A model and get all the B models, where available:
the_as = A.objects.all()
for a in the_as:
print a.b.fk_field.field2 #Note that this throws an error if there is no B record
The problem with this is that you are looking at a huge number of database calls to retrieve all of the data.
Now suppose you wanted to retrieve a QuerySet of all A models in the database, but with all of the subclass records and the subclass's foreign key records as well, using select_related() to limit your app to a single database call. You would write a query like this:
the_as = A.objects.select_related("b", "b__fk_field").all()
One query returns all of the data needed! Awesome.
Except not. Because this version of the query is doing its own filtering, even though select_related is not supposed to filter any results at all:
set_1 = A.objects.select_related("b", "b__fk_field").all() #Only returns A objects with associated B objects
set_2 = A.objects.all() #Returns all A objects
len(set_1) > len(set_2) #Will always be False
I used the django-debug-toolbar to inspect the query and found the problem. The generated SQL query uses an INNER JOIN to join the C table to the query, instead of a LEFT OUTER JOIN like other subclassed fields:
SELECT "app_a"."field1", "app_b"."fk_field_id", "app_c"."field2"
FROM "app_a"
LEFT OUTER JOIN "app_b" ON ("app_a"."id" = "app_b"."a_ptr_id")
INNER JOIN "app_c" ON ("app_b"."fk_field_id" = "app_c"."id");
And it seems if I simply change the INNER JOIN to LEFT OUTER JOIN, then I get the records that I want, but that doesn't help me when using Django's ORM.
Is this a bug in select_related() in Django's ORM? Is there any work around for this, or am I simply going to have to do a direct query of the database and map the results myself? Should I be using something like Django-Polymorphic to do this?
It looks like a bug, specifically it seems to be ignoring the nullable nature of the A->B relationship, if for example you had a foreign key reference to B in A instead of the subclassing, that foreign key would of course be nullable and django would use a left join for it. You should probably raise this in the django issue tracker. You could also try using prefetch_related instead of select_related that might get around your issue.
I found a work around for this, but I will wait a while to accept it in hopes that I can get some better answers.
The INNER JOIN created by the select_related('b__fk_field') needs to be removed from the underlying SQL so that the results aren't filtered by the B records in the database. So the new query needs to leave the b__fk_field parameter in select_related out:
the_as = A.objects.select_related('b')
However, this forces us to call the database everytime a C object is accessed from the A object.
for a in the_as:
#Note that this throws an DoesNotExist error if a doesn't have an
#associated b
print a.b.fk_field.field2 #Hits the database everytime.
The hack to work around this is to get all of the C objects we need from the database from one query and then have each B object reference them manually. We can do this because the database call that accesses the B objects retrieved will have the fk_field_id that references their associated C object:
c_ids = [a.b.fk_field_id for a in the_as] #Get all the C ids
the_cs = C.objects.filter(pk__in=c_ids) #Run a query to get all of the needed C records
for c in the_cs:
for a in the_as:
if a.b.fk_field_id == c.pk: #Throws DoesNotExist if no b associated with a
a.b.fk_field = c
break
I'm sure there's a functional way to write that without the nested loop, but this illustrates what's happening. It's not ideal, but it provides all of the data with the absolute minimum number of database hits - which is what I wanted.
In a django view, I need to append string data to the end of an existing text column in my database. So, for example, say I have a table named "ATable", and it has a field named "aField". I'd like to be able to append a string to the end of "aField" in a race-condition-free way. Initially, I had this:
tableEntry = ATable.objects.get(id=100)
tableEntry.aField += aStringVar
tableEntry.save()
The problem is that if this is being executed concurrently, both can get the same "tableEntry", then they each independently update, and the last one to "save" wins, losing the data appended by the other.
I looked into this a bit and found this, which I hoped would work, using an F expression:
ATable.objects.filter(id=100).update(aField=F('aField') + aStringVar)
The problem here, is I get an SQL error, saying:
operator does not exist: text + unknown
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
Tried changing to "str(aStringVar)" even though its already a string - no luck.. I found a couple django bug reports complaining about similar issues, but I didn't see a fix or a workaround. Is there some way I can cast aStringVar such that it can be appended to the text of the F expression? BTW - also tried "str(F('aField')) + aStringVar" but that converted the result of the F expression to the string "(DEFAULT: )".
You can use the Concat db function.
from django.db.models import Value
from django.db.models.functions import Concat
ATable.objects.filter(id=100).update(some_field=Concat('some_field', Value('more string')))
In my case, I am adding a suffix for facebook avatars URIs like this:
FACEBOOK_URI = 'graph.facebook.com'
FACEBOOK_LARGE = '?type=large'
# ...
users = User.objects.filter(Q(avatar_uri__icontains=FACEBOOK_URI) & ~Q(avatar_uri__icontains=FACEBOOK_LARGE))
users.update(avatar_uri=Concat('avatar_uri', Value(FACEBOOK_LARGE)))
and I get SQL like this (Django 1.9):
UPDATE `user_user` SET `avatar_uri` = CONCAT(COALESCE(`user_user`.`avatar_uri`, ''), COALESCE('?type=large', ''))
WHERE (`user_user`.`avatar_uri` LIKE '%graph.facebook.com%' AND NOT (`user_user`.`avatar_uri` LIKE '%?type=large%' AND `user_user`.`avatar_uri` IS NOT NULL))
The result is all image URIs were changed from http://graph.facebook.com/<fb user id>/picture to http://graph.facebook.com/<fb user id>/picture?type=large
You can override F object in Django with one simple change:
class CF(F):
ADD = '||'
Then just use CF in place of F. It will place "||" instead of "+" when generating SQL. For example, the query:
User.objects.filter(pk=100).update(email=CF('username') + '#gmail.com')
will generate the SQL:
UPDATE "auth_user" SET "email" = "auth_user"."username" || '#gmail.com'
WHERE "auth_user"."id" = 100
And if you get this running, it isn't thread safe. While your update is running, some other process can update a model not knowing the data in the database is updated.
You have too acquire a lock, but don't forget this senario:
Django: m = Model.objects.all()[10]
Django: m.field = field
Django: a progress which takes a while (time.sleep(100))
DB: Lock table
DB: Update field
DD: Unlock table
Django: the slow process is finished
Django: m.save()
Now the field update became undone by the model instance in Django (Ghost write)
You can achieve this functionality with Django's select_for_update() operator: https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-for-update
Something like this:
obj = ATable.objects.select_for_update().get(id=100)
obj.aField = obj.aField + aStringVar
obj.save()
The table row will be locked when you call .select_for_update().get(), and the lock will be released when you call .save(), allowing you to perform the operation atomically.
seems you can't do this. however, what you are trying to do could be solved using transactions
(looks like you are using postgres, so if you want to do it in one query and use raw sql as suggested, || is the concatenation operator you want)
I've a model called Valor. Valor has a Robot. I'm querying like this:
Valor.objects.filter(robot=r).reverse()[0]
to get the last Valor the the r robot. Valor.objects.filter(robot=r).count() is about 200000 and getting the last items takes about 4 seconds in my PC.
How can I speed it up? I'm querying the wrong way?
The optimal mysql syntax for this problem would be something along the lines of:
SELECT * FROM table WHERE x=y ORDER BY z DESC LIMIT 1
The django equivalent of this would be:
Valor.objects.filter(robot=r).order_by('-id')[:1][0]
Notice how this solution utilizes django's slicing method to limit the queryset before compiling the list of objects.
If none of the earlier suggestions are working, I'd suggest taking Django out of the equation and run this raw sql against your database. I'm guessing at your table names, so you may have to adjust accordingly:
SELECT * FROM valor v WHERE v.robot_id = [robot_id] ORDER BY id DESC LIMIT 1;
Is that slow? If so, make your RDBMS (MySQL?) explain the query plan to you. This will tell you if it's doing any full table scans, which you obviously don't want with a table that large. You might also edit your question and include the schema for the valor table for us to see.
Also, you can see the SQL that Django is generating by doing this (using the query set provided by Peter Rowell):
qs = Valor.objects.filter(robot=r).order_by('-id')[0]
print qs.query
Make sure that SQL is similar to the 'raw' query I posted above. You can also make your RDBMS explain that query plan to you.
It sounds like your data set is going to be big enough that you may want to denormalize things a little bit. Have you tried keeping track of the last Valor object in the Robot object?
class Robot(models.Model):
# ...
last_valor = models.ForeignKey('Valor', null=True, blank=True)
And then use a post_save signal to make the update.
from django.db.models.signals import post_save
def record_last_valor(sender, **kwargs):
if kwargs.get('created', False):
instance = kwargs.get('instance')
instance.robot.last_valor = instance
post_save.connect(record_last_valor, sender=Valor)
You will pay the cost of an extra db transaction when you create the Valor objects but the last_valor lookup will be blazing fast. Play with it and see if the tradeoff is worth it for your app.
Well, there's no order_by clause so I'm wondering about what you mean by 'last'. Assuming you meant 'last added',
Valor.objects.filter(robot=r).order_by('-id')[0]
might do the job for you.
django 1.6 introduces .first() and .last():
https://docs.djangoproject.com/en/1.6/ref/models/querysets/#last
So you could simply do:
Valor.objects.filter(robot=r).last()
Quite fast should also be:
qs = Valor.objects.filter(robot=r) # <-- it doesn't hit the database
count = qs.count() # <-- first hit the database, compute a count
last_item = qs[ count-1 ] # <-- second hit the database, get specified rownum
So, in practice you execute only 2 SQL queries ;)
Model_Name.objects.first()
//To get the first element
Model_name.objects.last()
//For get last()
in my case, the last is not work because there is only one row in the database
maybe help full for you too :)
Is there a limit clause in django? This way you can have the db, simply return a single record.
mysql
select * from table where x = y limit 1
sql server
select top 1 * from table where x = y
oracle
select * from table where x = y and rownum = 1
I realize this isn't translated into django, but someone can come back and clean this up.
The correct way of doing this, is to use the built-in QuerySet method latest() and feeding it whichever column (field name) it should sort by. The drawback is that it can only sort by a single db column.
The current implementation looks like this and is optimized in the same sense as #Aaron's suggestion.
def latest(self, field_name=None):
"""
Returns the latest object, according to the model's 'get_latest_by'
option or optional given field_name.
"""
latest_by = field_name or self.model._meta.get_latest_by
assert bool(latest_by), "latest() requires either a field_name parameter or 'get_latest_by' in the model"
assert self.query.can_filter(), \
"Cannot change a query once a slice has been taken."
obj = self._clone()
obj.query.set_limits(high=1)
obj.query.clear_ordering()
obj.query.add_ordering('-%s' % latest_by)
return obj.get()