I haven't seen any thing on this topic in Django's online documents.
I am trying to save a list of objects to database, but what I can do is loop through the list and call save() on every object.
So does Django hit database several times? Or Django will do one batch save instead?
As of Django 1.4, there exists a bulk_create() method on the QuerySet object, which allows for inserting a list of objects in a single query. For more info, see:
Django documentation for bulk_create
Django 1.4 release notes
The ticket that implemented this feature
Unfortunately, batch inserts are something that Django 1.3 and prior do not directly support. If you want to use the ORM, then you do have to call save() on each individual object. If it's a large list and performance is an issue, you can use django.db.cursor to INSERT the items manually inside a transaction to dramatically speed the process up. If you have a huge dataset, you need to start looking at Database engine specific methods, like COPY FROM in Postgres.
From Django 1.4 exists bulk_create(), but, always but.
You need to be careful, using bulk_create() it wont call instance save() method internally.
As django docs says
The model’s save() method will not be called
So, if you are overriding save method, (as my case was) you can't use bulk_create.
This question is also addressed in How do I perform a batch insert in Django?, which provides some ways to make Django do this.
This might be a good starting point, but as the author of the code snippet says, it might not be production ready.
Related
I'm hoping this will be a really simple question.
I just wanted some advise on the bulk updating of records.
An example bulk update may go something like this:
for user in User.objects.all().iterator():
user.is_active = False
user.save()
Is there a more efficient way to do this on the database level using the Django ORM?
Would the following be more efficient:
User.objects.all().update(is_active=False)?
It will work, but be aware that the update command will be converted directly to a SQL command, without running anything you customized on save and without triggering the save signals. If in your case there is no problem and you are more worried with the performance, go for it.
From django docs:
Be aware that the update() method is converted directly to an SQL statement. It is a bulk operation for direct updates. It doesn’t run any save() methods on your models, or emit the pre_save or post_save signals (which are a consequence of calling save()), or honor the auto_now field option. If you want to save every item in a QuerySet and make sure that the save() method is called on each instance, you don’t need any special function to handle that.
https://docs.djangoproject.com/en/2.2/topics/db/queries/#updating-multiple-objects-at-once
Yes, using update would be more efficient. That will do a single database call, instead of one per object.
Yes. You can use User.objects.all().update(is_active=False).
This will result in a single query that looks like:
UPDATE `auth_user`
SET `is_active` = 0
It will thus reduce the number of roundtrips to the database to one.
I am using Django's native Authorization/Authentication model to manage logins for my WebApp. This creates instances of the User model.
I would like to write a simple class-based-APIView that can tell me if a specific email is already used (IE: Is there already a user with the given email in my database?). The first time this API is called, it should get the matching User object from the DB. But subsequent times it is called, it should return it from the Memcache (if and only if, the underlying row in the database is unchanged). How can I do that??
Should I inherit from generic.APIView? Why or why not? What would the view look like? In particular I want to understand how to properly do the memcaching and cache-coherency checking. Furthermore, how would this memcaching scheme work if I had another API that modified the User object?
Thanks. I was unable to find detailed idiot-proof manual on using memcaching properly in Django.
Caching is perhaps the simplest part of django - so I'll leave that discussion to the last. The bigger problem is figuring out when your model changed.
You can decide what constitutes an update. For example, you might consider that only when a particular field is updated, then the cache is updated. Your cache update process should be limited to the writing/updating code or view. If you go about this method, then I would recommend django-model-utils and its StatusField - you can add this logic in save() method by overriding it; or implement it at the code that is updating models.
You can also do a simpler approach, that is, no matter what is updated - as long as save() is called, expire the cache and repopulate it.
The rest of the code is very simple.
Attempt to fetch the item from the cache, if the item doesn't exist (called a cache miss), then you populate the cache by fetching from the database. Otherwise, you'll get the item from the cache and then you save yourself a database hit.
The cache interface is very simple, you set('somekey', 'somevalue') you can optionally tell it when to expire the item. Then you try to get('somekey'), if this returns None, then its a cache miss (perhaps the item expired), and you have to fetch it and populate the cache. Otherwise, you'll get the cached object.
On Google AppEngine, we have .put() and put_async(), which are called to save an model object. ().
Being new to GAE, it is not clear to me how I can ensure that some functionality gets executed every time I same an object.
In vanilla Django, I can use signals, or override the .save() method.
How would I achieve similar results on GAE, considering I can actually rely on .put() being called when an object is saved?
There are several ways you could accomplish this. You could override the put method with your own code. Just be sure to call the models super put().
However, the route I would choose would be to implement a post put hook (assuming you're using NDB). See the hook method documentation here: https://developers.google.com/appengine/docs/python/ndb/modelclass
It appears that there's no way for me to get an ORM object with numerous select_related objects and create a Form and run my data through all the cleaning and additional validations I have.
I'm simply trying to reduce the number of queries that get executed, the form either takes an id and re-creates the object, or does not take advantage of the additional fields (I've tried using model_to_dict)
Is there anything I'm missing? How can I use select_related inside of a form?
Django 1.4
Do:
form = MyFormClass(instance=my_object)
Your form will need to be a ModelForm, see the docs for more information!
What is the recommend approach when extending some sort of save behavior in Django, such as saving calculated values?
I've seen people overriding the save method and I've seen people using signals.
What is the correct/most used/better approach for this?
save(), delete() do not get called on bulk actions, signals are your only option then.
I use simple approach. If need to update some fields on object itself - redefine save(). If need to work with other objects or querysets somehow - connect signals.