How to properly increment a counter in my postgreSQL database? - django

Let's say i want to implement a "Like/Unlike" system in my app. I need to count each like for sorting purposes later. Can i simply insert the current value + 1 ? I think it's too simple.
What if two user click on the same time ? How to prevent my counter to be disturbed ?
I read i need to implement transactions by a simple decorator #transaction.atomic but i am wonder if this can handle my concern.
Transactions are designed to execute a "bloc" of operations triggered by one user, whereas in my case i need be able to handle multiple request at the same time and safely update the counter.
Any advise ?

You can use F() expression, eg.
content.likes_count = F('likes_count') + 1
content.save()
So the operation will be excuted in database not in python.
From the django documentation.
Another useful benefit of F() is that having the database - rather
than Python - update a field’s value avoids a race condition.
If two Python threads execute the code in the first example above, one
thread could retrieve, increment, and save a field’s value after the
other has retrieved it from the database. The value that the second
thread saves will be based on the original value; the work of the
first thread will simply be lost.
If the database is responsible for updating the field, the process is
more robust: it will only ever update the field based on the value of
the field in the database when the save() or update() is executed,
rather than based on its value when the instance was retrieved.

Related

Does Model.update method in django locks the table before saving the instances?

I have a scenario in which I need to copy the values of one column into an another column. I am trying to do
Model.objects.select_related('vcsdata').all().update(charging_status_v2=F('charging_status'))
Does using F expression along with the update to copy the values would create any downtime? does it locks the table while performing the operation?
related_question_for_more_context
Short Answer:
No, it doesn't.
The only thing Django does in update process (whether you use F expression or not) is keeping the previous state of your record(s) in case if something goes wrong it can rollback to the previous state.
def update(self, **kwargs):
"""
Update all elements in the current QuerySet, setting all the given
fields to the appropriate values.
"""
self._not_support_combined_queries('update')
assert not self.query.is_sliced, \
"Cannot update a query once a slice has been taken."
self._for_write = True
query = self.query.chain(sql.UpdateQuery)
query.add_update_values(kwargs)
# Clear any annotations so that they won't be present in subqueries.
query.annotations = {}
with transaction.mark_for_rollback_on_error(using=self.db):
rows = query.get_compiler(self.db).execute_sql(CURSOR)
self._result_cache = None
return rows
Basically in the line with transaction.mark_for_rollback_on_error(using=self.db), it keeps the previous state of your record, but it does not lock your table or any kind of partial locks.
For example if you have two simultaneous updates at the same time, (suppose one of them is going to take much longer than the other and also slower one hits your database before faster one) then the faster one is going to hit your database regardless of the slower one and does the operation. Then slower one is going to do some other operation on your table (this example is enough for proving that update does not lock your table).
Also note that calling update for updating multiple objects (if this is a doable thing) is the most efficient way for updating multiple objects as far as I know (comparing to calling save on each instance or bulk update).

Do I need to commit transactions in Django 1.6?

I want to create and object, save it to DB, then check if there is another row on the DB with the same token with execution_time=0. If there is, I want to delete the object created then restart the process.
transfer = Transfer(token = generateToken(size=9))
transfer.save()
while (len(Transfer.objects.filter(token=transfer.token, execution_time=0))!=1):
transfer.delete()
transfer = Transfer(token = generateToken(size=9))
transfer.save()
Do I need to commit the transaction between every loop? For example calling commit() at the end of every loop?
while (len(Transfer.objects.filter(token=transfer.token, execution_time=0))!=1):
transfer.delete()
transfer = Transfer(token = generateToken(size=9))
transfer.save()
commit()
#transaction.commit_manually
def commit():
transaction.commit()
From what you've described I don't think you need to use transactions. You're basically recreating a transaction rollback manually with your code.
I think the best way to handle this would be to have a database constraint enforce the issue. Is it the case that token and execution_time should be unique together? In that case you can define the constraint in Django with unique_together. If the constraint is that token should be unique whenever execution_time is 0, some databases will let you define a constraint like that as well.
If the constraint were in the database you could just do a get_or_create() in a loop until the Transfer was created.
If you can't define the constraint in the database for whatever reason then I think your version would work. (One improvement would be to use .count() instead of len.)
I want to create and object, save it to DB, then check if there is
another row on the DB with the same token with execution_time=0. If
there is, I want to delete the object created then restart the
process.
There are few ways you can approach this, depending on what your end goal is:
Do you want that no other record is written while you are writing yours (to prevent duplicates?) If so, you need to get a lock on your table, and to do that, you need to execute an atomic transaction, with #transaction.atomic (new in 1.6)
If you want to make sure that no duplicate records are created given a combination of fields, you need to enforce this at the database level with unique_together
I believe combining the above two will solve your problem; however, if you want a more brute force approach; you can override the save() method for your object, and then raise an appropriate exception when a record is trying to be created (or updated) that violates your constraints.
In your view, you would then catch this exception and then take the appropriate action.

Django another optimizing save()

In the process of optimizing queries in my app I noticed something strange. In a given section of code I would get the object, make update some values and then save. In theory this should execute 2 queries. But in fact its executing 3 queries. 1 select query when I get the object and 2 when I save the object (Another select and then the update!). While removing one query may seem silly. In this particular method I am updating many objects so every query I save is 1 less hit on the db and should speed up the method.
Through inspection of the queries the two select queries are different the first gets many things and the select executed by the same is simple.
Here is the example code:
myobject = room.myobjects.get(id=myobject_id) # one query executed here
myobject.color = color
myobject.shape = shape
myobject.place = place
myobject.save() # two queries executed here
queries:
1) "SELECT `rooms_object`.`id`, `rooms_object`.`room_id`, ......FROM `rooms_object` WHERE (`rooms_object`.`id` = %s AND `rooms_object`.`room_id` = %s )"
2) "SELECT (1) AS `a` FROM `rooms_object` WHERE `rooms_object`.`id` = %s LIMIT 1"
3) "UPDATE ......this ones obvious"
I want the save method to recognize it already has the object in memory and it does not need to get it again....if that is even possible...
The second query is not actually pulling down the object again. It is doing an extremely fast "existence" check on the id before performing an UPDATE query. All that is returned from that query is a single 1, and the field is indexed, so it should be extremely efficient.
The reason they have chosen to design the ORM this way, is first they look at your object to see if it currently has an ID. If it does, they do the SELECT to make sure it really does still exist in the database. If it does, they perform the update. If somehow the record does not exist, they perform an INSERT. You can test this by creating the object, then deleting the row manually from your database, without django knowing. Then call save()
This is how it works to make sure django maintains consistency.
If it were a new object, you would only get a single INSERT query, because it knows the object has no id right now.
This is managed with force_update parameter in
Model.save([force_insert=False, force_update=False, using=DEFAULT_DB_ALIAS, update_fields=None])
Set force_update to True to disable existence checking ("SELECT (1) AS a FROM...").
https://docs.djangoproject.com/en/dev/ref/models/instances/

Is get_or_create() thread safe

I have a Django model that can only be accessed using get_or_create(session=session), where session is a foreign key to another Django model.
Since I am only accessing through get_or_create(), I would imagine that I would only ever have one instance with a key to the session. However, I have found multiple instances with keys to the same session. What is happening? Is this a race condition, or does get_or_create() operate atomically?
NO, get_or_create is not atomic.
It first asks the DB if a satisfying row exists; database returns, python checks results; if it doesn't exist, it creates it. In between the get and the create anything can happen - and a row corresponding to the get criteria be created by some other code.
For instance wrt to your specific issue if two pages are open by the user (or several ajax requests are performed) at the same time this might cause all get to fail, and for all of them to create a new row - with the same session.
It is thus important to only use get_or_create when the duplication issue will be caught by the database through some unique/unique_together, so that even though multiple threads can get to the point of save(), only one will succeed, and the others will raise an IntegrityError that you can catch and deal with.
If you use get_or_create with (a set of) fields that are not unique in the database you will create duplicates in your database, which is rarely what you want.
More in general: do not rely on your application to enforce uniqueness and avoid duplicates in your database! THat's the database job!
(well unless you wrap your critical functions with some OS-valid locks, but I would still suggest to use the database).
With thes warnings, used correctly get_or_create is an easy to read, easy to write construct that perfectly complements the database integrity checks.
Refs and citations:
http://groups.google.com/group/django-developers/browse_thread/thread/905f79e350525c95/0af3a41de4f4ce06
http://groups.google.com/group/django-developers/browse_thread/thread/f0b3381b2620e7db/8eae2f6087e550bb
Actualy it's not thread-safe, you can look at the code of the get_or_create method of the QuerySet object, basicaly what it does is the following :
try:
return self.get(**lookup), False
except self.model.DoesNotExist:
params = dict([(k, v) for k, v in kwargs.items() if '__' not in k])
params.update(defaults)
obj = self.model(**params)
sid = transaction.savepoint(using=self.db)
obj.save(force_insert=True, using=self.db)
transaction.savepoint_commit(sid, using=self.db)
return obj, True
So two threads might figure-out that the instance does not exists in the DB and start creating a new one, before saving them consecutively.
Threading is one problem, but get_or_create is broken for any serious usage in default isolation level of MySQL:
How do I deal with this race condition in django?
Why doesn't this loop display an updated object count every five seconds?
https://code.djangoproject.com/ticket/13906
http://www.no-ack.org/2010/07/mysql-transactions-and-django.html
I was having this problem with a view that calls get_or_create.
I was using Gunicorn with multiple workers, so to test it I changed the number of workers to 1 and this made the problem disappeared.
The simplest solution I found was to lock the table for access. I used this decorator to do the lock per view (for PostgreSQL):
http://www.caktusgroup.com/blog/2009/05/26/explicit-table-locking-with-postgresql-and-django/
EDIT: I wrapped the lock statement in that decorator in a try/except to deal with DB engines with no support for it (SQLite while unit testing in my case):
try:
cursor.execute('LOCK TABLE %s IN %s MODE' % (model._meta.db_table, lock))
except DatabaseError:
pass

Web Service to return unique auto incremented human readable id number

I'm looking to create a simple web service that when polled returns a unique id. The ID has to be human readable (i.e. not a guid, probably in the form 000023) and is simply incremented by 1 each time its called.
Now I need to consider that it may be called by two different applications at the same time and I don't want it to return the same number to each application.
Is there another option than using a database to store the current number?
Surely this has been done before, can anyone point me at some source code if it is.
Thanks,
Neil
Use a critical section piece of code to control flow one at a time through a section of code. You can do this using the lock statement or by being slightly more hardcore and using a mutex directly. Doing this will ensure that you return a different number to each caller.
As for storing it, using a database is overkill for returning an auto incrementing number - although SQLServer and Oracle (and most likely others but i can't speak for them) both provide an auto incrementing keys feature, so you could have the webservice called, generate a new entry in the database table, return the key, and the caller can use that number as a key back to that record (if you are saving more data later after the initial call). This way you also let the database worry about the generation of unique numbers, you don't have to worry about the details of it - although this is not a good option if you don't already have a database.
The other option is to store it in a local file, although that would be expensive to read the file, increment the number, and write it back out, all within a critical section.
you can use a file.
Pseudocode:
if (!locked('counter.txt'))
counter = read('counter.txt')
else
wait
startAgain
lock('counter.txt')
counter++
print counter
write('counter.txt', counter)
unlock('counter.txt)