In the process of optimizing queries in my app I noticed something strange. In a given section of code I would get the object, make update some values and then save. In theory this should execute 2 queries. But in fact its executing 3 queries. 1 select query when I get the object and 2 when I save the object (Another select and then the update!). While removing one query may seem silly. In this particular method I am updating many objects so every query I save is 1 less hit on the db and should speed up the method.
Through inspection of the queries the two select queries are different the first gets many things and the select executed by the same is simple.
Here is the example code:
myobject = room.myobjects.get(id=myobject_id) # one query executed here
myobject.color = color
myobject.shape = shape
myobject.place = place
myobject.save() # two queries executed here
queries:
1) "SELECT `rooms_object`.`id`, `rooms_object`.`room_id`, ......FROM `rooms_object` WHERE (`rooms_object`.`id` = %s AND `rooms_object`.`room_id` = %s )"
2) "SELECT (1) AS `a` FROM `rooms_object` WHERE `rooms_object`.`id` = %s LIMIT 1"
3) "UPDATE ......this ones obvious"
I want the save method to recognize it already has the object in memory and it does not need to get it again....if that is even possible...
The second query is not actually pulling down the object again. It is doing an extremely fast "existence" check on the id before performing an UPDATE query. All that is returned from that query is a single 1, and the field is indexed, so it should be extremely efficient.
The reason they have chosen to design the ORM this way, is first they look at your object to see if it currently has an ID. If it does, they do the SELECT to make sure it really does still exist in the database. If it does, they perform the update. If somehow the record does not exist, they perform an INSERT. You can test this by creating the object, then deleting the row manually from your database, without django knowing. Then call save()
This is how it works to make sure django maintains consistency.
If it were a new object, you would only get a single INSERT query, because it knows the object has no id right now.
This is managed with force_update parameter in
Model.save([force_insert=False, force_update=False, using=DEFAULT_DB_ALIAS, update_fields=None])
Set force_update to True to disable existence checking ("SELECT (1) AS a FROM...").
https://docs.djangoproject.com/en/dev/ref/models/instances/
Related
I have a scenario in which I need to copy the values of one column into an another column. I am trying to do
Model.objects.select_related('vcsdata').all().update(charging_status_v2=F('charging_status'))
Does using F expression along with the update to copy the values would create any downtime? does it locks the table while performing the operation?
related_question_for_more_context
Short Answer:
No, it doesn't.
The only thing Django does in update process (whether you use F expression or not) is keeping the previous state of your record(s) in case if something goes wrong it can rollback to the previous state.
def update(self, **kwargs):
"""
Update all elements in the current QuerySet, setting all the given
fields to the appropriate values.
"""
self._not_support_combined_queries('update')
assert not self.query.is_sliced, \
"Cannot update a query once a slice has been taken."
self._for_write = True
query = self.query.chain(sql.UpdateQuery)
query.add_update_values(kwargs)
# Clear any annotations so that they won't be present in subqueries.
query.annotations = {}
with transaction.mark_for_rollback_on_error(using=self.db):
rows = query.get_compiler(self.db).execute_sql(CURSOR)
self._result_cache = None
return rows
Basically in the line with transaction.mark_for_rollback_on_error(using=self.db), it keeps the previous state of your record, but it does not lock your table or any kind of partial locks.
For example if you have two simultaneous updates at the same time, (suppose one of them is going to take much longer than the other and also slower one hits your database before faster one) then the faster one is going to hit your database regardless of the slower one and does the operation. Then slower one is going to do some other operation on your table (this example is enough for proving that update does not lock your table).
Also note that calling update for updating multiple objects (if this is a doable thing) is the most efficient way for updating multiple objects as far as I know (comparing to calling save on each instance or bulk update).
Let's say i want to implement a "Like/Unlike" system in my app. I need to count each like for sorting purposes later. Can i simply insert the current value + 1 ? I think it's too simple.
What if two user click on the same time ? How to prevent my counter to be disturbed ?
I read i need to implement transactions by a simple decorator #transaction.atomic but i am wonder if this can handle my concern.
Transactions are designed to execute a "bloc" of operations triggered by one user, whereas in my case i need be able to handle multiple request at the same time and safely update the counter.
Any advise ?
You can use F() expression, eg.
content.likes_count = F('likes_count') + 1
content.save()
So the operation will be excuted in database not in python.
From the django documentation.
Another useful benefit of F() is that having the database - rather
than Python - update a field’s value avoids a race condition.
If two Python threads execute the code in the first example above, one
thread could retrieve, increment, and save a field’s value after the
other has retrieved it from the database. The value that the second
thread saves will be based on the original value; the work of the
first thread will simply be lost.
If the database is responsible for updating the field, the process is
more robust: it will only ever update the field based on the value of
the field in the database when the save() or update() is executed,
rather than based on its value when the instance was retrieved.
Let's say I need to do some work both on a set of model objects, as well as a subset of the first set:
things = Thing.objects.filter(active=True)
for thing in things: # (1)
pass # do something with each `thing`
special_things = things.filter(special=True)
for thing in special_things: # (2)
pass # do something more with these things
My understanding is that at point (1) marked in the code above, an actual SQL query something like SELECT * FROM things_table WHERE active=1 will get executed against the database. The QuerySet documentation also says:
When a QuerySet is evaluated, it typically caches its results.
Now my question is, what happens at point (2) in the example Python code above?
Will Django execute a second SQL query, something like SELECT * FROM things_table WHERE active=1 AND special=1?
Or, will it use the cached result from earlier, automatically doing for me behind the scenes something like the more optimal filter(lambda d: d.special == True, things), i.e. avoiding a needless second trip to the database?
Either way, is the current behavior guaranteed (by documentation or something) or should I not rely on it? For example, it is not only a point of optimization, but could also make a possible logic difference if the database table is modified by another thread/process between the two potential queries.
It will execute a second SQL query. filter creates a new queryset, which doesn't copy the results cache.
As for guarantees - well, the docs specify that filter returns a new queryset object. I think you can be confident that that new queryset won't have cached results yet. As further support, the "when are querysets evaluated" docs suggest using .all() to get a new queryset if you want to pick up possibly changed results:
If the data in the database might have changed since a QuerySet was
evaluated, you can get updated results for the same query by calling
all() on a previously evaluated QuerySet.
I want to create and object, save it to DB, then check if there is another row on the DB with the same token with execution_time=0. If there is, I want to delete the object created then restart the process.
transfer = Transfer(token = generateToken(size=9))
transfer.save()
while (len(Transfer.objects.filter(token=transfer.token, execution_time=0))!=1):
transfer.delete()
transfer = Transfer(token = generateToken(size=9))
transfer.save()
Do I need to commit the transaction between every loop? For example calling commit() at the end of every loop?
while (len(Transfer.objects.filter(token=transfer.token, execution_time=0))!=1):
transfer.delete()
transfer = Transfer(token = generateToken(size=9))
transfer.save()
commit()
#transaction.commit_manually
def commit():
transaction.commit()
From what you've described I don't think you need to use transactions. You're basically recreating a transaction rollback manually with your code.
I think the best way to handle this would be to have a database constraint enforce the issue. Is it the case that token and execution_time should be unique together? In that case you can define the constraint in Django with unique_together. If the constraint is that token should be unique whenever execution_time is 0, some databases will let you define a constraint like that as well.
If the constraint were in the database you could just do a get_or_create() in a loop until the Transfer was created.
If you can't define the constraint in the database for whatever reason then I think your version would work. (One improvement would be to use .count() instead of len.)
I want to create and object, save it to DB, then check if there is
another row on the DB with the same token with execution_time=0. If
there is, I want to delete the object created then restart the
process.
There are few ways you can approach this, depending on what your end goal is:
Do you want that no other record is written while you are writing yours (to prevent duplicates?) If so, you need to get a lock on your table, and to do that, you need to execute an atomic transaction, with #transaction.atomic (new in 1.6)
If you want to make sure that no duplicate records are created given a combination of fields, you need to enforce this at the database level with unique_together
I believe combining the above two will solve your problem; however, if you want a more brute force approach; you can override the save() method for your object, and then raise an appropriate exception when a record is trying to be created (or updated) that violates your constraints.
In your view, you would then catch this exception and then take the appropriate action.
Each time the save() method is called on a Django object, Django executes two queries one INSERT and one SELECT. In my case this is usefull except for some specific places where each query is expensive. Any ideas on how to sometimes state that no object needs to be returned - no SELECT needed.
Also I'm using django-mssql to connect to, this problem doesn't seem to exist on MySQL.
EDIT : A better explanation
h = Human()
h.name='John Foo'
print h.id # Returns None, No insert has been done therefore no id is available
h.save()
print h.id # Returns the ID, an insert has taken place and also a select statement to return the id
Sometimes I don't the need the retruning ID, just insert
40ins's answer was right, but probably it might have higher costs...
When django execustes a save(), it needed to be sure if the object is a new one or an existing one. So it hits the database to check if related objext exists. If yes, it executes an UPDATE, orherwise it executes an ISERT
Check documentatin from here...
You can use force_insert or force_update ,but that might be cause serious data integrity problems, like creating a duplicate entry instead of updating the existing one...
So, if you wish to use force , you must be sure whether it will be an INSERT or an UPDATE...
Try to use save() method with force_insert or force_update attributes. With this attributes django knows about record existence and don't make additional query.
The additional select is the django-mssql backend getting the identity value from the table to determine the ID that was just inserted. If this select is slow, then something is wrong with your SQL server/configuration because it is only doing SELECT CAST(IDENT_CURRENT(*table_name*) as bigint) call.