Do I need to commit transactions in Django 1.6? - django

I want to create and object, save it to DB, then check if there is another row on the DB with the same token with execution_time=0. If there is, I want to delete the object created then restart the process.
transfer = Transfer(token = generateToken(size=9))
transfer.save()
while (len(Transfer.objects.filter(token=transfer.token, execution_time=0))!=1):
transfer.delete()
transfer = Transfer(token = generateToken(size=9))
transfer.save()
Do I need to commit the transaction between every loop? For example calling commit() at the end of every loop?
while (len(Transfer.objects.filter(token=transfer.token, execution_time=0))!=1):
transfer.delete()
transfer = Transfer(token = generateToken(size=9))
transfer.save()
commit()
#transaction.commit_manually
def commit():
transaction.commit()

From what you've described I don't think you need to use transactions. You're basically recreating a transaction rollback manually with your code.
I think the best way to handle this would be to have a database constraint enforce the issue. Is it the case that token and execution_time should be unique together? In that case you can define the constraint in Django with unique_together. If the constraint is that token should be unique whenever execution_time is 0, some databases will let you define a constraint like that as well.
If the constraint were in the database you could just do a get_or_create() in a loop until the Transfer was created.
If you can't define the constraint in the database for whatever reason then I think your version would work. (One improvement would be to use .count() instead of len.)

I want to create and object, save it to DB, then check if there is
another row on the DB with the same token with execution_time=0. If
there is, I want to delete the object created then restart the
process.
There are few ways you can approach this, depending on what your end goal is:
Do you want that no other record is written while you are writing yours (to prevent duplicates?) If so, you need to get a lock on your table, and to do that, you need to execute an atomic transaction, with #transaction.atomic (new in 1.6)
If you want to make sure that no duplicate records are created given a combination of fields, you need to enforce this at the database level with unique_together
I believe combining the above two will solve your problem; however, if you want a more brute force approach; you can override the save() method for your object, and then raise an appropriate exception when a record is trying to be created (or updated) that violates your constraints.
In your view, you would then catch this exception and then take the appropriate action.

Related

How to properly increment a counter in my postgreSQL database?

Let's say i want to implement a "Like/Unlike" system in my app. I need to count each like for sorting purposes later. Can i simply insert the current value + 1 ? I think it's too simple.
What if two user click on the same time ? How to prevent my counter to be disturbed ?
I read i need to implement transactions by a simple decorator #transaction.atomic but i am wonder if this can handle my concern.
Transactions are designed to execute a "bloc" of operations triggered by one user, whereas in my case i need be able to handle multiple request at the same time and safely update the counter.
Any advise ?
You can use F() expression, eg.
content.likes_count = F('likes_count') + 1
content.save()
So the operation will be excuted in database not in python.
From the django documentation.
Another useful benefit of F() is that having the database - rather
than Python - update a field’s value avoids a race condition.
If two Python threads execute the code in the first example above, one
thread could retrieve, increment, and save a field’s value after the
other has retrieved it from the database. The value that the second
thread saves will be based on the original value; the work of the
first thread will simply be lost.
If the database is responsible for updating the field, the process is
more robust: it will only ever update the field based on the value of
the field in the database when the save() or update() is executed,
rather than based on its value when the instance was retrieved.

How to modify a queryset and save it as new objects?

I need to query for a set of objects for a particular Model, change a single attribute/column ("account"), and then save the entire queryset's objects as new objects/rows. In other words, I want to duplicate the objects, with a single attribute ("account") changed on the duplicates. I'm basically creating a new account and then going through each model and copying a previous account's objects to the new account, so I'll be doing this repeatedly, with different models, probably using django shell. How should I approach this? Can it be done at the queryset level or do I need to loop through all the objects?
i.e.,
MyModel.objects.filter(account="acct_1")
# Now I need to set account = "acct_2" for the entire queryset,
# and save as new rows in the database
From the docs:
If the object’s primary key attribute is not set, or if it’s set but a
record doesn’t exist, Django executes an INSERT.
So if you set the id or pk to None it should work, but I've seen conflicting responses to this solution on SO: Duplicating model instances and their related objects in Django / Algorithm for recusrively duplicating an object
This solution should work (thanks #JoshSmeaton for the fix):
models = MyModel.objects.filter(account="acct_1")
for model in models:
model.id = None
model.account = "acct_2"
model.save()
I think in my case, I have a OneToOneField on the model that I'm testing on, so it makes sense that my test wouldn't work with this basic solution. But, I believe it should work, so long as you take care of OneToOneField's.

Django another optimizing save()

In the process of optimizing queries in my app I noticed something strange. In a given section of code I would get the object, make update some values and then save. In theory this should execute 2 queries. But in fact its executing 3 queries. 1 select query when I get the object and 2 when I save the object (Another select and then the update!). While removing one query may seem silly. In this particular method I am updating many objects so every query I save is 1 less hit on the db and should speed up the method.
Through inspection of the queries the two select queries are different the first gets many things and the select executed by the same is simple.
Here is the example code:
myobject = room.myobjects.get(id=myobject_id) # one query executed here
myobject.color = color
myobject.shape = shape
myobject.place = place
myobject.save() # two queries executed here
queries:
1) "SELECT `rooms_object`.`id`, `rooms_object`.`room_id`, ......FROM `rooms_object` WHERE (`rooms_object`.`id` = %s AND `rooms_object`.`room_id` = %s )"
2) "SELECT (1) AS `a` FROM `rooms_object` WHERE `rooms_object`.`id` = %s LIMIT 1"
3) "UPDATE ......this ones obvious"
I want the save method to recognize it already has the object in memory and it does not need to get it again....if that is even possible...
The second query is not actually pulling down the object again. It is doing an extremely fast "existence" check on the id before performing an UPDATE query. All that is returned from that query is a single 1, and the field is indexed, so it should be extremely efficient.
The reason they have chosen to design the ORM this way, is first they look at your object to see if it currently has an ID. If it does, they do the SELECT to make sure it really does still exist in the database. If it does, they perform the update. If somehow the record does not exist, they perform an INSERT. You can test this by creating the object, then deleting the row manually from your database, without django knowing. Then call save()
This is how it works to make sure django maintains consistency.
If it were a new object, you would only get a single INSERT query, because it knows the object has no id right now.
This is managed with force_update parameter in
Model.save([force_insert=False, force_update=False, using=DEFAULT_DB_ALIAS, update_fields=None])
Set force_update to True to disable existence checking ("SELECT (1) AS a FROM...").
https://docs.djangoproject.com/en/dev/ref/models/instances/

Is get_or_create() thread safe

I have a Django model that can only be accessed using get_or_create(session=session), where session is a foreign key to another Django model.
Since I am only accessing through get_or_create(), I would imagine that I would only ever have one instance with a key to the session. However, I have found multiple instances with keys to the same session. What is happening? Is this a race condition, or does get_or_create() operate atomically?
NO, get_or_create is not atomic.
It first asks the DB if a satisfying row exists; database returns, python checks results; if it doesn't exist, it creates it. In between the get and the create anything can happen - and a row corresponding to the get criteria be created by some other code.
For instance wrt to your specific issue if two pages are open by the user (or several ajax requests are performed) at the same time this might cause all get to fail, and for all of them to create a new row - with the same session.
It is thus important to only use get_or_create when the duplication issue will be caught by the database through some unique/unique_together, so that even though multiple threads can get to the point of save(), only one will succeed, and the others will raise an IntegrityError that you can catch and deal with.
If you use get_or_create with (a set of) fields that are not unique in the database you will create duplicates in your database, which is rarely what you want.
More in general: do not rely on your application to enforce uniqueness and avoid duplicates in your database! THat's the database job!
(well unless you wrap your critical functions with some OS-valid locks, but I would still suggest to use the database).
With thes warnings, used correctly get_or_create is an easy to read, easy to write construct that perfectly complements the database integrity checks.
Refs and citations:
http://groups.google.com/group/django-developers/browse_thread/thread/905f79e350525c95/0af3a41de4f4ce06
http://groups.google.com/group/django-developers/browse_thread/thread/f0b3381b2620e7db/8eae2f6087e550bb
Actualy it's not thread-safe, you can look at the code of the get_or_create method of the QuerySet object, basicaly what it does is the following :
try:
return self.get(**lookup), False
except self.model.DoesNotExist:
params = dict([(k, v) for k, v in kwargs.items() if '__' not in k])
params.update(defaults)
obj = self.model(**params)
sid = transaction.savepoint(using=self.db)
obj.save(force_insert=True, using=self.db)
transaction.savepoint_commit(sid, using=self.db)
return obj, True
So two threads might figure-out that the instance does not exists in the DB and start creating a new one, before saving them consecutively.
Threading is one problem, but get_or_create is broken for any serious usage in default isolation level of MySQL:
How do I deal with this race condition in django?
Why doesn't this loop display an updated object count every five seconds?
https://code.djangoproject.com/ticket/13906
http://www.no-ack.org/2010/07/mysql-transactions-and-django.html
I was having this problem with a view that calls get_or_create.
I was using Gunicorn with multiple workers, so to test it I changed the number of workers to 1 and this made the problem disappeared.
The simplest solution I found was to lock the table for access. I used this decorator to do the lock per view (for PostgreSQL):
http://www.caktusgroup.com/blog/2009/05/26/explicit-table-locking-with-postgresql-and-django/
EDIT: I wrapped the lock statement in that decorator in a try/except to deal with DB engines with no support for it (SQLite while unit testing in my case):
try:
cursor.execute('LOCK TABLE %s IN %s MODE' % (model._meta.db_table, lock))
except DatabaseError:
pass

Django - Insert Without Returning the Id of the Saved Object

Each time the save() method is called on a Django object, Django executes two queries one INSERT and one SELECT. In my case this is usefull except for some specific places where each query is expensive. Any ideas on how to sometimes state that no object needs to be returned - no SELECT needed.
Also I'm using django-mssql to connect to, this problem doesn't seem to exist on MySQL.
EDIT : A better explanation
h = Human()
h.name='John Foo'
print h.id # Returns None, No insert has been done therefore no id is available
h.save()
print h.id # Returns the ID, an insert has taken place and also a select statement to return the id
Sometimes I don't the need the retruning ID, just insert
40ins's answer was right, but probably it might have higher costs...
When django execustes a save(), it needed to be sure if the object is a new one or an existing one. So it hits the database to check if related objext exists. If yes, it executes an UPDATE, orherwise it executes an ISERT
Check documentatin from here...
You can use force_insert or force_update ,but that might be cause serious data integrity problems, like creating a duplicate entry instead of updating the existing one...
So, if you wish to use force , you must be sure whether it will be an INSERT or an UPDATE...
Try to use save() method with force_insert or force_update attributes. With this attributes django knows about record existence and don't make additional query.
The additional select is the django-mssql backend getting the identity value from the table to determine the ID that was just inserted. If this select is slow, then something is wrong with your SQL server/configuration because it is only doing SELECT CAST(IDENT_CURRENT(*table_name*) as bigint) call.