Is there a Django ORM best practice for this SQL:
REPLACE app_model SET field_1 = 'some val', field_2 = 'some val';
Assumption: field_1 or field_2 would have a unique key on them (or in my case on both), otherwise this would always evaluate to an INSERT.
Edit:
My best personal answer right now is this, but it's 2-3 queries where 1 should be possible:
from django.core.exceptions import ValidationError
try:
Model(field_1='some val',field_2='some val').validate_unique()
Model(field_1='some val',field_2='some val',extra_field='some val').save()
except ValidationError:
Model.objects.filter(field_1='some val',field_2='some val').update(extra_field='some val')
You say you want REPLACE, which I believe is supposed to delete any existing rows before inserting, but your example indicates you want something more like UPSERT.
AFAIK, django doesn't support REPLACE (or sqlite's INSERT OR REPLACE, or UPSERT). But your code could be consolidated:
obj, created = Model.objects.get_or_create(field_1='some val', field_2='some_val')
obj.extra_field = 'some_val'
obj.save()
This of course assumes that either field_1, field_2, or both are unique (as you've said).
It's still two queries (a SELECT for get_or_create, and an INSERT or UPDATE for save), but until an UPSERT-like solution comes along (and it may not for a long, long time), it may be the best you can do.
Since Django 1.7 you can use update_or_create method:
obj, created = Person.objects.update_or_create(
first_name='John',
last_name='Lennon',
defaults={'profession': 'musician'},
)
I think the following is more efficient.
(obj, created) = Model.objects.get_or_create(
field_1 = 'some val',
field_2 = 'some_val',
defaults = {
'extra_field': 'some_val'
},
)
if not created and obj.extra_field != 'some_val':
obj.extra_field = 'some_val'
obj.save(
update_fields = [
'extra_field',
],
)
This will only update the extra_field if the row was not created and needs to be updated.
Related
I have two models, where one of them has a TreeManyToMany field (a many-to-many field for django-mptt models).
class ProductCategory(models.Model):
pass
class Product(models.Model):
categories = TreeManyToManyField('ProductCategory')
I'm trying to add categories to the Product objects in bulk by using the through-table, as seen in this thread or this.
I do this with a list of pairs: product_id and productcategory_id, which is also what my through-table contains. This yields the following error:
sqlite3.IntegrityError: UNIQUE constraint failed: products_product_categories.product_id, products_product_categories.productcategory_id
My code looks as follows:
def bulk_add_cats(generic_df):
# A pandas dataframe where ["product_object"] is the product_id
# and ["found_categories"] is the productcategory_id to add to that product.
generic_df = generic_df.explode("found_categories")
generic_df["product_object"] = generic_df.apply(
lambda x: Product.objects.filter(
merchant = x["forhandler"], product_id = x["produktid"]
).values('id')[0]['id'], axis=1
)
# The actual code
# Here row.product_object is the product_id and row.found_categories is one
# particular productcategory_id, so they make up a pair to add to the through-table.
through_objs = [
Product.categories.through(
product_id = row.product_object,
productcategory_id = row.found_categories
) for row in generic_df.itertuples()
]
Product.categories.through.objects.bulk_create(through_objs, batch_size=1000)
I have also done the following, to check that there are no duplicate pairs in the set of pairs that I want to add. There were none:
print(generic_df[generic_df.duplicated(subset=['product_object','found_categories'], keep=False)])
I suspect that the error happens because some of the product-to-productcategory relations already exist in the table. So maybe I should check if that is the case first, for each of the pairs, and then do the bulk_create. I just want to hope to retain effiency and would be sad if I have to iterate through each pair to check. Is there are way to bulk-update-or-create for this type of problem?
Or what do you think? Any help is highly appreciated :-)
You should probably check if the reason why you are having those errors is because of duplicate data. If that's the case, I think the ignore_conflicts parameter of bulk_create can solve your problem. It will save every valid entry and ignore those who have duplicate conflict.
Check the doc:
https://docs.djangoproject.com/en/3.2/ref/models/querysets/#django.db.models.query.QuerySet.bulk_create
I want to do a filter in Django that uses form method.
If the user type de var it should query in the dataset that var, if it is left in blank to should bring all the elements.
How can I do that?
I am new in Django
if request.GET.get('Var'):
Var = request.GET.get('Var')
else:
Var = WHAT SHOULD I PUT HERE TO FILTER ALL THE ELEMNTS IN THE CODE BELLOW
models.objects.filter(Var=Var)
It's not a great idea from a security standpoint to allow users to input data directly into search terms (and should DEFINITELY not be done for raw SQL queries if you're using any of those.)
With that note in mind, you can take advantage of more dynamic filter creation using a dictionary syntax, or revise the queryset as it goes along:
Option 1: Dictionary Syntax
def my_view(request):
query = {}
if request.GET.get('Var'):
query['Var'] = request.GET.get('Var')
if request.GET.get('OtherVar'):
query['OtherVar'] = request.GET.get('OtherVar')
if request.GET.get('thirdVar'):
# Say you wanted to add in some further processing
thirdVar = request.GET.get('thirdVar')
if int(thirdVar) > 10:
query['thirdVar'] = 10
else:
query['thirdVar'] = int(thirdVar)
if request.GET.get('lessthan'):
lessthan = request.GET.get('lessthan')
query['fieldname__lte'] = int(lessthan)
results = MyModel.objects.filter(**query)
If nothing has been added to the query dictionary and it's empty, that'll be the equivalent of doing MyModel.objects.all()
My security note from above applies if you wanted to try to do something like this (which would be a bad idea):
MyModel.objects.filter(**request.GET)
Django has a good security track record, but this is less safe than anticipating the types of queries that your users will have. This could also be a huge issue if your schema is known to a malicious site user who could adapt their query syntax to make a heavy query along non-indexed fields.
Option 2: Revising the Queryset
Alternatively, you can start off with a queryset for everything and then filter accordingly
def my_view(request):
results = MyModel.objects.all()
if request.GET.get('Var'):
results = results.filter(Var=request.GET.get('Var'))
if request.GET.get('OtherVar'):
results = results.filter(OtherVar=request.GET.get('OtherVar'))
return results
A simpler and more explicit way of doing this would be:
if request.GET.get('Var'):
data = models.objects.filter(Var=request.GET.get('Var'))
else:
data = models.objects.all()
I'd like to update a table with Django - something like this in raw SQL:
update tbl_name set name = 'foo' where name = 'bar'
My first result is something like this - but that's nasty, isn't it?
list = ModelClass.objects.filter(name = 'bar')
for obj in list:
obj.name = 'foo'
obj.save()
Is there a more elegant way?
Update:
Django 2.2 version now has a bulk_update.
Old answer:
Refer to the following django documentation section
Updating multiple objects at once
In short you should be able to use:
ModelClass.objects.filter(name='bar').update(name="foo")
You can also use F objects to do things like incrementing rows:
from django.db.models import F
Entry.objects.all().update(n_pingbacks=F('n_pingbacks') + 1)
See the documentation.
However, note that:
This won't use ModelClass.save method (so if you have some logic inside it won't be triggered).
No django signals will be emitted.
You can't perform an .update() on a sliced QuerySet, it must be on an original QuerySet so you'll need to lean on the .filter() and .exclude() methods.
Consider using django-bulk-update found here on GitHub.
Install: pip install django-bulk-update
Implement: (code taken directly from projects ReadMe file)
from bulk_update.helper import bulk_update
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
r = random.randrange(4)
person.name = random_names[r]
bulk_update(people) # updates all columns using the default db
Update: As Marc points out in the comments this is not suitable for updating thousands of rows at once. Though it is suitable for smaller batches 10's to 100's. The size of the batch that is right for you depends on your CPU and query complexity. This tool is more like a wheel barrow than a dump truck.
Django 2.2 version now has a bulk_update method (release notes).
https://docs.djangoproject.com/en/stable/ref/models/querysets/#bulk-update
Example:
# get a pk: record dictionary of existing records
updates = YourModel.objects.filter(...).in_bulk()
....
# do something with the updates dict
....
if hasattr(YourModel.objects, 'bulk_update') and updates:
# Use the new method
YourModel.objects.bulk_update(updates.values(), [list the fields to update], batch_size=100)
else:
# The old & slow way
with transaction.atomic():
for obj in updates.values():
obj.save(update_fields=[list the fields to update])
If you want to set the same value on a collection of rows, you can use the update() method combined with any query term to update all rows in one query:
some_list = ModelClass.objects.filter(some condition).values('id')
ModelClass.objects.filter(pk__in=some_list).update(foo=bar)
If you want to update a collection of rows with different values depending on some condition, you can in best case batch the updates according to values. Let's say you have 1000 rows where you want to set a column to one of X values, then you could prepare the batches beforehand and then only run X update-queries (each essentially having the form of the first example above) + the initial SELECT-query.
If every row requires a unique value there is no way to avoid one query per update. Perhaps look into other architectures like CQRS/Event sourcing if you need performance in this latter case.
Here is a useful content which i found in internet regarding the above question
https://www.sankalpjonna.com/learn-django/running-a-bulk-update-with-django
The inefficient way
model_qs= ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
obj.name = 'foo'
obj.save()
The efficient way
ModelClass.objects.filter(name = 'bar').update(name="foo") # for single value 'foo' or add loop
Using bulk_update
update_list = []
model_qs= ModelClass.objects.filter(name = 'bar')
for model_obj in model_qs:
model_obj.name = "foo" # Or what ever the value is for simplicty im providing foo only
update_list.append(model_obj)
ModelClass.objects.bulk_update(update_list,['name'])
Using an atomic transaction
from django.db import transaction
with transaction.atomic():
model_qs = ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
ModelClass.objects.filter(name = 'bar').update(name="foo")
Any Up Votes ? Thanks in advance : Thank you for keep an attention ;)
To update with same value we can simply use this
ModelClass.objects.filter(name = 'bar').update(name='foo')
To update with different values
ob_list = ModelClass.objects.filter(name = 'bar')
obj_to_be_update = []
for obj in obj_list:
obj.name = "Dear "+obj.name
obj_to_be_update.append(obj)
ModelClass.objects.bulk_update(obj_to_be_update, ['name'], batch_size=1000)
It won't trigger save signal every time instead we keep all the objects to be updated on the list and trigger update signal at once.
IT returns number of objects are updated in table.
update_counts = ModelClass.objects.filter(name='bar').update(name="foo")
You can refer this link to get more information on bulk update and create.
Bulk update and Create
I want to do pretty much the same like in this ticket at djangoproject.com, but with some additonal formatting. From this query
>>> MyModel.objects.values('cryptic_value_name')
[{'cryptic_value_name': 1}, {'cryptic_value_name': 2}]
I want to get something like that:
>>> MyModel.objects.values(renamed_value='cryptic_value_name')
[{'renamed_value': 1}, {'renamed_value': 2}]
Is there another, more builtin way or do I have to do this manually?
From django>=1.8 you can use annotate and F object
from django.db.models import F
MyModel.objects.annotate(renamed_value=F('cryptic_value_name')).values('renamed_value')
Also extra() is going to be deprecated, from the django docs:
This is an old API that we aim to deprecate at some point in the future. Use it only if you cannot express your query using other queryset methods. If you do need to use it, please file a ticket using the QuerySet.extra keyword with your use case (please check the list of existing tickets first) so that we can enhance the QuerySet API to allow removing extra(). We are no longer improving or fixing bugs for this method.
Without using any other manager method (tested on v3.0.4):
from django.db.models import F
MyModel.objects.values(renamed_value=F('cryptic_value_name'))
Excerpt from Django docs:
An F() object represents the value of a model field or annotated
column. It makes it possible to refer to model field values and
perform database operations using them without actually having to pull
them out of the database into Python memory.
It's a bit hacky, but you could use the extra method:
MyModel.objects.extra(
select={
'renamed_value': 'cryptic_value_name'
}
).values(
'renamed_value'
)
This basically does SELECT cryptic_value_name AS renamed_value in the SQL.
Another option, if you always want the renamed version but the db has the cryptic name, is to name your field with the new name but use db_column to refer to the original name in the db.
I am working with django 1.11.6
( And the key:value pair is opposite to that of accepted answer )
This is how i am making it work for my project
def json(university):
address = UniversityAddress.objects.filter(university=university)
address = address.extra(select={'city__state__country__name': 'country', 'city__state__name': 'state', 'city__name': 'city'})
address = address.values('country', 'state', "city", 'street', "postal_code").get()
return address
Note that adding simultanous objects.filter().extra().values() is same as above.
Try passing as kwargs:
MyModel.objects.annotate(**{'A B C':F('profile_id')}).values('A B C')
In my case, there were spaces and other special characters included in the key of each value in the result set so this did the trick.
Its more than simple if you want to rename few fields of the mode.
Try
projects = Project.objects.filter()
projects = [{'id': p.id, 'name': '%s (ID:%s)' % (p.department, p.id)} for p in projects]
Here i do not have a name field in the table, but i can get that after tweaking a little bit.
What I'm looking for is a QuerySet containing any objects not tagged.
The solution I've come up with so far looks overly complicated to me:
# Get all tags for model
tags = Location.tags.all().order_by('name')
# Get a list of tagged location id's
tag_list = tags.values_list('name', flat=True)
tag_names = ', '.join(tag_list)
tagged_locations = Location.tagged.with_any(tag_names) \
.values_list('id', flat=True)
untagged_locations = []
for location in Location.objects.all():
if location.id not in tagged_locations:
untagged_locations.append(location)
Any ideas for improvement? Thanks!
There is some good information in this post, so I don't feel that it should be deleted, but there is a much, much simpler solution
I took a quick peek at the source code for django-tagging. It looks like they use the ContentType framework and generic relations to pull it off.
Because of this, you should be able to create a generic reverse relation on your Location class to get easy access to the TaggedItem objects for a given location, if you haven't already done so:
from django.contrib.contenttypes import generic
from tagging.models import TaggedItem
class Location(models.Model):
...
tagged_items = generic.GenericRelation(TaggedItem,
object_id_field="object_id",
content_type_field="content_type")
...
Clarification
My original answer suggested to do this:
untagged_locs = Location.objects.filter(tagged_items__isnull=True)
Although this would work for a 'normal join', this actually doesn't work here because the content type framework throws an additional check on content_type_id into the SQL for isnull:
SELECT [snip] FROM `sotest_location`
LEFT OUTER JOIN `tagging_taggeditem`
ON (`sotest_location`.`id` = `tagging_taggeditem`.`object_id`)
WHERE (`tagging_taggeditem`.`id` IS NULL
AND `tagging_taggeditem`.`content_type_id` = 4 )
You can hack-around it by reversing it like this:
untagged_locs = Location.objects.exclude(tagged_items__isnull=False)
But that doesn't quite feel right.
I also proposed this, but it was pointed out that annotations don't work as expected with the content types framework.
from django.db.models import Count
untagged_locs = Location.objects.annotate(
num_tags=Count('tagged_items')).filter(num_tags=0)
The above code works for me in my limited test case, but it could be buggy if you have other 'taggable' objects in your model. The reason being that it doesn't check the content_type_id as outlined in the ticket. It generated the following SQL:
SELECT [snip], COUNT(`tagging_taggeditem`.`id`) AS `num_tags`
FROM `sotest_location`
LEFT OUTER JOIN `tagging_taggeditem`
ON (`sotest_location`.`id` = `tagging_taggeditem`.`object_id`)
GROUP BY `sotest_location`.`id` HAVING COUNT(`tagging_taggeditem`.`id`) = 0
ORDER BY NULL
If Location is your only taggable object, then the above would work.
Proposed Workaround
Short of getting the annotation mechanism to work, here's what I would do in the meantime:
untagged_locs_e = Location.objects.extra(
where=["""NOT EXISTS(SELECT 1 FROM tagging_taggeditem ti
INNER JOIN django_content_type ct ON ti.content_type_id = ct.id
WHERE ct.model = 'location'
AND ti.object_id = myapp_location.id)"""]
)
This adds an additional WHERE clause to the SQL:
SELECT [snip] FROM `myapp_location`
WHERE NOT EXISTS(SELECT 1 FROM tagging_taggeditem ti
INNER JOIN django_content_type ct ON ti.content_type_id = ct.id
WHERE ct.model = 'location'
AND ti.object_id = myapp_location.id)
It joins to the django_content_type table to ensure that you're looking at the appropriate
content type for your model in the case where you have more than one taggable model type.
Change myapp_location.id to match your table name. There's probably a way to avoid hard-coding the table names, but you can figure that out if it's important to you.
Adjust accordingly if you're not using MySQL.
Try this:
[location for location in Location.objects.all() if location.tags.count() == 0]
Assuming your Location class uses the tagging.fields.TagField utility.
from tagging.fields import TagField
class Location(models.Model):
tags = TagField()
You can just do this:
Location.objects.filter(tags='')