Peewee How do i call post_delete signal for dependencies? - foreign-keys

I see peewee has signal http://docs.peewee-orm.com/en/latest/peewee/playhouse.html#signals
But it works only with delete_instance().
For what I hope are obvious reasons, Peewee signals do not work when
you use the Model.insert(), Model.update(), or Model.delete() methods.
These methods generate queries that execute beyond the scope of the
ORM, and the ORM does not know about which model instances might or
might not be affected when the query executes.
Signals work by hooking into the higher-level peewee APIs like
Model.save() and Model.delete_instance(), where the affected model
instance is known ahead of time.
So, how do i use signal on records which are dependencies since it use delete() on dependencies?
def delete_instance(self, recursive=False, delete_nullable=False):
if recursive:
dependencies = self.dependencies(delete_nullable)
for query, fk in reversed(list(dependencies)):
model = fk.model
if fk.null and not delete_nullable:
model.update(**{fk.name: None}).where(query).execute()
else:
model.delete().where(query).execute()
return type(self).delete().where(self._pk_expr()).execute()

The short answer is that you can't, because recursive deletes are implemented using queries of the form:
DELETE FROM ... WHERE foreign_key_id = X
e.g., consider you have a user and they have created 1000 tweets. The tweet.user_id points to the user.id. When you delete that user, peewee issues 2 queries:
DELETE FROM tweets WHERE user_id = 123
DELETE FROM users WHERE id = 123
If it were to call delete_instance() you would end up issuing 1001 queries instead. So it is hopefully quite clear why it is implemented in this way.
I'd suggest that if you're deleting rows and need to perform some kind of cleanup on related rows, that you may be better off doing soft-deletes (e.g., setting a status=DELETED). Then do the processing of any relations outside of the signals API.

Related

Determine how many times a Django model instance has been updated

I'm trying to find a generic way to get a count of how many times an instance of a model has had any of its fields updated. In other words, in Django, how do I get a count of how many times a specific row in a table has been updated? I'm aiming to show a count of how many updates have been made.
Let's say I have:
class MyModel(models.Model):
field = models.CharField()
another_field = models.IntegerField()
...
and I have an instance of the model:
my_model = MyModel.objects.get(id=1)
Is there a way to find out how many times my_model has had any of its fields updated? Or would I need to create a field like update_count and increment it each time a field is updated? Hopefully there is some kind of mechanism available in Django so I don't have to go that route.
Hopefully this isn't too basic of a question, I'm still learning Django and have been struggling with how to figure this out on my own.
There is no generic way to get this. As mentioned by wim you can use some "versioning package" to track whole history of changes. I've personally used the same suggestion: django-reversion, but there are other alternatives.
If you need to track only some fields then you may program some simpler mechanism yourself:
create a model/field to track your information
use something like FieldTracker to track changes to specific fields
Create handler post-save signal (or just modify model's save method) to save the data
You may also use something like "table audit". I haven't tried anything like that myself but there are some packages for that too:
https://github.com/StefanKjartansson/django-postgres-audit
https://github.com/torstenrudolf/django-audit-trigger
https://github.com/kvesteri/postgresql-audit

Updating and fetching a Django model object atomically

I want a capability to update-and-read an object atomically. For example, something like a first() on top of update() below:
obj = MyModel.objects.filter(pk=100).update(counter=F('counter') + 1).first()
I know it is awkward construct. But just want to show my need.
For the record I have used class method like:
#classmethod
def update_counter(cls, job_id):
with transaction.atomic():
job = (cls.objects.select_for_update().get(pk=job_id))
job.counter += 1
job.save()
return job
where I would call as below and get my updated obj.
my_obj = my_obj.update_counter()
But the question is, is there any other django model technique given such read back are common and likely used by multiple threads to conclude something based on, say, the final count.
Digging deeper I could not find any direct way of getting the object(s) that I am updating in an sql chained way. As Dani Herrera commented above the update and read have to be two sql queries. Only mechanism that helps me with that requirement is therefore the class method I had also included above. In fact, it helps me to add additional field updates in the same class method atomically in future.
For example, the method could very well be "def update_progress(job_id, final_desired_count)" where I can update more fields such as "self.progress_percentage = (self.counter / final_desired_count) * 100".
The class method approach may turn out to be a good investment for me.

Why is the database used by Django subqueries not sticky?

I have a concern with django subqueries using the django ORM. When we fetch a queryset or perform a DB operation, I have the option of bypassing all assumptions that django might make for the database that needs to be used by forcing usage of the specific database that I want.
b_det = Book.objects.using('some_db').filter(book_name = 'Mark')
The above disregards any database routers I might have set and goes straight to 'some_db'.
But if my models approximately look like so :-
class Author(models.Model):
author_name=models.CharField(max_length=255)
author_address=models.CharField(max_length=255)
class Book(models.Model):
book_name=models.CharField(max_length=255)
author=models.ForeignKey(Author, null = True)
And I fetch a QuerySet representing all books that are called Mark like so:-
b_det = Book.objects.using('some_db').filter(book_name = 'Mark')
Then later if somewhere in the code I trigger a subquery by doing something like:-
if b_det:
auth_address = b_det[0].author.author_address
Then this does not make use of the original database 'some_db' that I had specified early on for the main query. This again goes through the routers and picks up (possibly) the incorrect database.
Why does django do this. IMHO , if I had selected forced usage of database for the original query then even for the subquery the same database needs to be used. Why must the database routers come into picture for this at all?
This is not a subquery in the strict SQL sense of the word. What you are actually doing here is to execute one query and use the result of that to find related items.
You can chain filters and do lots of other operations on a queryset but it will not be executed until you take a slice on it or call .values() but here you are actually taking a slice
auth_address = b_det[0].#rest of code
So you have a materialized query and you are now trying to find the address of the related author and that requires another query but you are not using with so django is free to choose which database to use. You cacn overcome this by using select_related

Do I need to commit transactions in Django 1.6?

I want to create and object, save it to DB, then check if there is another row on the DB with the same token with execution_time=0. If there is, I want to delete the object created then restart the process.
transfer = Transfer(token = generateToken(size=9))
transfer.save()
while (len(Transfer.objects.filter(token=transfer.token, execution_time=0))!=1):
transfer.delete()
transfer = Transfer(token = generateToken(size=9))
transfer.save()
Do I need to commit the transaction between every loop? For example calling commit() at the end of every loop?
while (len(Transfer.objects.filter(token=transfer.token, execution_time=0))!=1):
transfer.delete()
transfer = Transfer(token = generateToken(size=9))
transfer.save()
commit()
#transaction.commit_manually
def commit():
transaction.commit()
From what you've described I don't think you need to use transactions. You're basically recreating a transaction rollback manually with your code.
I think the best way to handle this would be to have a database constraint enforce the issue. Is it the case that token and execution_time should be unique together? In that case you can define the constraint in Django with unique_together. If the constraint is that token should be unique whenever execution_time is 0, some databases will let you define a constraint like that as well.
If the constraint were in the database you could just do a get_or_create() in a loop until the Transfer was created.
If you can't define the constraint in the database for whatever reason then I think your version would work. (One improvement would be to use .count() instead of len.)
I want to create and object, save it to DB, then check if there is
another row on the DB with the same token with execution_time=0. If
there is, I want to delete the object created then restart the
process.
There are few ways you can approach this, depending on what your end goal is:
Do you want that no other record is written while you are writing yours (to prevent duplicates?) If so, you need to get a lock on your table, and to do that, you need to execute an atomic transaction, with #transaction.atomic (new in 1.6)
If you want to make sure that no duplicate records are created given a combination of fields, you need to enforce this at the database level with unique_together
I believe combining the above two will solve your problem; however, if you want a more brute force approach; you can override the save() method for your object, and then raise an appropriate exception when a record is trying to be created (or updated) that violates your constraints.
In your view, you would then catch this exception and then take the appropriate action.

count() returning zero hits in post_save

We have a Dialog and a Comment object. We have a denormalized field, num_comments, on Dialog to keep track of the number of comments. When a new comment is saved (or deleted) we want to increase/decrease this value properly.
# sender=Comment, called post_save and post_delete
def recalc_comments(sender, instance, created=False, **kwargs):
# Comments that will be deleted might not have a dialog (when dialog gets deleted)
if not hasattr(instance, "dialog"):
return
dialog = instance.dialog
dialog.update(
num_comments = sender.public.filter(dialog=dialog).count(),
num_commentators = sender.public.filter(dialog=dialog).aggregate(c=Count('user', distinct=True))["c"],
)
The problem that has started to appear is that the query for num_comments returns zero for the first comment posted. This does not happen every time, and only in cases with aprox. > 1000 comments in the result set (not much, I know...).
Could it be that the Comment has not yet been saved to the database when the count() is performed? To complicate things further we are using Johnny Cache (with memcached) as a layer between the ORM and database.
Any input would be greatly appreciated!
As far as I understood you want to do denormalization of your database scheme for best query performance. In this case I can recommend you application designed specially for this purpose - django-composition
As documentation said:
django-composition provides the abstract way to denormalize data from
your models in simple declarative way through special generic model
field called CompositionField.
Most cases of data denormalization are pretty common so
django-composition has several "short-cuts" fields that handles most
of them.
CompositionField is django model field that provides interface to data
denormalization.
You also can use this shortcut ForeignCountField. It help to count number of objects related by foreignkey.