Django - copy and insert queryset clone using bulk_create - django

My goal is to create a clone of a queryset and then insert it into the database.
Following the suggestions of this post, I have the following code:
qs_new = copy.copy(qs)
MyModel.objects.bulk_create(qs_new)
However, with this code I run into duplicate primary key error. As for now, I only can come up with the following work-around:
qs_new = copy.copy(qs)
for x in qs_new:
x.id = None
MyModel.objects.bulk_create(qs_new)
Question: Can I implement this code snippet without going through loop ?

Can't think of a way without loop, but just a suggestion:
# add all fields here except 'id'
qs = qs.values('field1', 'field2', 'field3')
new_qs = [MyModel(**i) for i in qs]
MyModel.objects.bulk_create(new_qs)

Note that bulk_create behaves differently depending on the underlying database. With Postgres you get the new primary keys set:
Support for setting primary keys on objects created using
bulk_create() when using PostgreSQL was added.
https://docs.djangoproject.com/en/1.10/ref/models/querysets/#django.db.models.query.QuerySet.bulk_create
You should, however make sure that the objects you are creating either have no primary keys or only keys that are not taken yet. In the latter case you should run the code that sets the PKs as well as the bulk_create inside transaction.atomic().
Fetching the values explicitly as suggested by Shang Wang might be faster because only the given values are retrieved from the DB instead of fetching everything. If you have foreign key relations or m2m relations you might want to avoid simply throwing the complex instances into bulk_create but instead explicitly naming all attributes that are required when constructing a new MyModel instance.
Here an example:
class MyModel(Model):
name = TextField(...)
related = ForeignKeyField(...)
my_m2m = ManyToManyField(...)
In case of MyModel above, you would want to preserve the ForeignKey relations by specifying related_id and the PK of the related object in the constructor of MyModel, avoiding specifying related.
With m2m relations, you might end up skipping bulk_create altogether because you need each specific new PK, the corresponding original PK (from the instance that was copied) and the m2m relations of that original instance. Then you would have to create new m2m relations with the new PK and these mappings.
# add all fields here except 'id'
qs = qs.values('name', 'related_id')
MyModel.objects.bulk_create([MyModel(**i) for i in qs])
Note for completeness:
If you have overriden save() on your model (or if you are inheriting from 3rd party with custom save methods), it won't be executed and neither will any post_save handlers (yours or 3rd party).

I tried and you need a loop to set the id to None, then it works. so finally it may be like this:
qs_new = copy.copy(qs)
for q in qs_new:
q.id = None
# also, you can set other fields if you need
MyModel.objects.bulk_create(qs_new)
This works for me.

Related

Concise way of getting or creating an object with given field values

Suppose I have:
from django.db import models
class MyContentClass(models.Model):
content = models.TextField()
another_field = models.TextField()
x = MyContentClass(content="Hello, world!", another_field="More Info")
Is there a more concise way to perform the following logic?
existing = MyContentClass.objects.filter(content=x.content, another_field=x.another_field)
if existing:
x = existing[0]
else:
x.save()
# x now points to an object which is saved to the DB,
# either one we've just saved there or one that already existed
# with the same field values we're interested in.
Specifically:
Is there a way to query for both (all) fields without specifying
each one separately?
Is there a better idiom for either getting the old object or saving the new one? Something like get_or_create, but which accepts an object as a parameter?
Assume the code which does the saving is separate from the code which generates the initial MyContentClass instance which we need to compare to. This is typical of a case where you have a function which returns a model object without also saving it.
You could convert x to a dictionary with
x_data = x.__dict__
Then that could be passed into the object's get_or_create method.
MyContentClass.objects.get_or_create(**x_data)
The problem with this is that there are a few fields that will cause this to error out (eg the unique ID, or the _state Django modelstate field). However, if you pop() those out of the dictionary beforehand, then you'd probably be good to go :)
cleaned_dict = remove_unneeded_fields(x_data)
MyContentClass.objects.get_or_create(**cleaned_dict)
def remove_unneeded_fields(x_data):
unneeded_fields = [
'_state',
'id',
# Whatever other fields you don't want the new obj to have
# eg any field marked as 'unique'
]
for field in unneeded_fields:
del x_data[field]
return x_data
EDIT
To avoid issues associated with having to maintain a whitelist/blacklist of fields you, could do something like this:
def remove_unneeded_fields(x_data, MyObjModel):
cleaned_data = {}
for field in MyObjModel._meta.fields:
if not field.unique:
cleaned_data[field.name] = x_data[field.name]
return cleaned_Data
There would probably have to be more validation than simply checking that the field is not unique, but this might offer some flexibility when it comes to minor model field changes.
I would suggest to create a custom manager for those models and add the functions you want to do with the models (like a custom get_or_create function).
https://docs.djangoproject.com/en/1.10/topics/db/managers/#custom-managers
This would be the cleanest way and involves no hacking. :)
You can create specific managers for specific models or create a superclass with functions you want for all models.
If you just want to add a second manager with a different name, beware that it will become the default manager if you don't set the objects manager first (https://docs.djangoproject.com/en/1.10/topics/db/managers/#default-managers)

How to track changes when using update() in Django models

I'm trying to keep track of the changes whenever a field is changed.
I can see the changes in Django Admin History whenever I use the .save() method, but whenever I use the .update() method it does not record whatever I changed in my object.
I want to use update() because it can change multiple fields at the same time. It makes the code cleaner and more efficient (one query, one line...)
Right now I'm using this:
u = Userlist.objects.filter(username=user['username']).update(**user)
I can see all the changes when I do
u = Userlist.objects.get(username=user['username'])
u.lastname=lastname
u.save()
I'm also using django-simple-history to see the changes.setup.
From the docs:
Finally, realize that update() does an update at the SQL level and,
thus, does not call any save() methods on your models, nor does it
emit the pre_save or post_save signals (which are a consequence of
calling Model.save())
update() works at the DB level, so Django admin cannot track changes when updates are applied via .update(...).
If you still want to track the changes on updates, you can use:
for user in Userlist.objects.filter(age__gt=40):
user.lastname = 'new name'
user.save()
This is however more expensive and is not advisable if the only benefit is tracking changes via the admin history.
Here's how I've handled this and it's worked well so far:
# get current model instance to update
instance = UserList.objects.get(username=username)
# use model_to_dict to convert object to dict (imported from django.forms.models import model_to_dict)
obj_dict = model_to_dict(instance)
# create instance of the model with this old data but do not save it
old_instance = UserList(**obj_dict)
# update the model instance (there are multiple ways to do this)
UserList.objects.filter(username=username).update(**user)
# get the updated object
updated_object = UserList.objects.get(id=id)
# get list of fields in the model class
my_model_fields = [field.name for field in cls._meta.get_fields()]
# get list of fields if they are different
differences = list(filter(lambda field: getattr(updated_object, field, None)!= getattr(old_instance, field, None), my_model_fields))
The differences variable will give you the list of fields that are different between the two instances. I also found it helpful to add which model fields I don't want to check for differences (e.g. we know the updated_date will always be changed, so we don't need to keep track of it).
skip_diff_fields = ['updated_date']
my_model_fields = []
for field in cls._meta.get_fields():
if field.name not in skip_diff_fields:
my_model_fields.append(field.name)

Django how to delete from ManyToManyField with extra field?

In my django app I have models set up similar to these models on the django site - Extra fields on many-to-many relationships. Further down the page, I read
The remove() method is disabled for similar reasons. However, the clear() method can be used to remove all many-to-many relationships for an instance:
If the remove method is disabled then how do I remove an object from a manytomany field? It says that I can use the clear method to remove everything but I only want to remove one specific element from the manytomany field.
You can remove the instance on the intermediary model.
From the example provided in djangoproject:
m_qs = Membership.objects.filter(person=person, group=group) #or some other logic to filter
try:
m = m_qs.get() #assuming queryset returns only 1 element
m.delete()
except:
pass #handle more gracefully

Django Set Default as Subset of ManyToMany Model Field

class UserProfile(models.Model):
project_assignments = models.ManyToManyField('drawingManager.Project', default=SetDefaultProject(self,default_assignment))
user = models.OneToOneField(User)
default_project
def SetDefaultProject(self,default_assignment):
if default_assignment:
self.default_project = default_assignment
else:
self.default_project = #somehow choose newest project
def __unicode__(self):
admin_reference = self.user.first_name + self.user.last_name
return admin_reference
I'm trying to achieve the following behavior. When a user is added at least one project assignment is set at the default. And they can later, through an options interface set their default to any of the subset of project_assignments. But it's not clear to me when you can use Foreign Key and Many to Many Fields as just python code and when you can't.
If I understand you correctly, you're not understanding that ForeignKeys and ManyToManyFields return different things.
A ForeignKey is a one-to-many relationship, with the 'one' on the side that it's pointing to. That means that if you defined default_project as a ForeignKey, self.default_project returns a single Project instance which you can use and assign as any other instance.
However, a ManyToManyField - as the name implies - has "many" relationships on both sides. So self.project_assignments doesn't return a single instance, it returns a QuerySet, which is the way Django handles lists of instances retrieved from the database. So you can use add and remove to manipulate that list, or slice it to get a single instance.
For example, if you wanted to set the default_project FK to the first project in the project assignments list, you would do:
self.default_project = self.project_assignments.all()[0]
(although in real code you would have to guard against the probability that there are no assignments, so that would raise an IndexError).
I'm not sure I undertsand what you mean by "it's not clear to me when you can use Foreign Key and Many to Many Fields as just python code", but your pseudo code would work with the following changes:
def SetDefaultProject(self,default_assignment):
if default_assignment:
self.project_assignments.add(default_assignment)
else:
self.project_assignments.add(self.project_assignments.latest('id'))
# note: i don't know what your intent is, so please adapt to your needs
# the key here is `MyModel.objects.latest('id')` to retrieve the latest object
# according to the field specified. 'id', 'date_created', etc.
PS: it's recommended to use lowercase names & underscores for method names (to not confuse them with ClassNameFormatRecommendations)
http://www.python.org/dev/peps/pep-0008/

Django: select_related and GenericRelation

Does select_related work for GenericRelation relations, or is there a reasonable alternative? At the moment Django's doing individual sql calls for each item in my queryset, and I'd like to avoid that using something like select_related.
class Claim(models.Model):
proof = generic.GenericRelation(Proof)
class Proof(models.Model):
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey('content_type', 'object_id')
I'm selecting a bunch of Claims, and I'd like the related Proofs to be pulled in instead of queried individually.
There isn't a built-in way to do this. But I've posted a technique for simulating select_related on generic relations on my blog.
Blog content summarized:
We can use Django's _content_object_cache field to essentially create our own select_related for generic relations.
generics = {}
for item in queryset:
generics.setdefault(item.content_type_id, set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(generics.keys())
relations = {}
for ct, fk_list in generics.items():
ct_model = content_types[ct].model_class()
relations[ct] = ct_model.objects.in_bulk(list(fk_list))
for item in queryset:
setattr(item, '_content_object_cache',
relations[item.content_type_id][item.object_id])
Here we get all the different content types used by the relationships
in the queryset, and the set of distinct object IDs for each one, then
use the built-in in_bulk manager method to get all the content types
at once in a nice ready-to-use dictionary keyed by ID. Then, we do one
query per content type, again using in_bulk, to get all the actual
object.
Finally, we simply set the relevant object to the
_content_object_cache field of the source item. The reason we do this is that this is the attribute that Django would check, and populate if
necessary, if you called x.content_object directly. By pre-populating
it, we're ensuring that Django will never need to call the individual
lookup - in effect what we're doing is implementing a kind of
select_related() for generic relations.
Looks like select_related and GRs don't work together. I guess you could write some kind of accessor for Claim that gets them all via the same query. This post gives you some pointers on raw SQL to get generic objects, if you need them
you can use .extra() function to manually extract fields :
Claims.filter(proof__filteryouwant=valueyouwant).extra(select={'field_to_pull':'proof_proof.field_to_pull'})
The .filter() will do the join, the .extra() will pull a field.
proof_proof is the SQL table name for Proof model.
If you need more than one field, specify each of them in the dictionnary.