How to bulk_update in related fields modified using only one call - django

I'm working with two tables to change some data in there but I'm wanted to avoid two calls using bulk_update as following:
queryset = MyModel.objects.all()
submodels_to_update = []
for instance in queryset:
instance.submodel = process()
instance.submodel.sub_property = some_random_data()
submodels_to_update.append(instance.submodel)
MyModel.objects.bulk_update(queryset, ['submodel'])
SubModel.objects.bulk_update(submodels_to_update, ['sub_property'])
What I wanted to do is something like the following:
MyModel.objects.bulk_update(queryset, ['submodel', 'submodel__sub_property'])
But bulk_update seems like not support related_fields using the ORM syntax. My question here is if there is a way to achieve bulk_update in two different tables at the same time.
If the answer is that it is not possible I don't have any problem to accept that, just share the relevant links that prove that the approach that I'm describing here is not supported.

Related

Problem with .only() method, passing to Pagination / Serialization --- all fields are getting returned instead of the ones specified in only()

I am trying load some data into datatables. I am trying to specify columns in the model.objects query by using .only() --- at first glance at the resulting QuerySet, it does in fact look like the mySQL query is only asking for those columns.
However, When I try to pass the QuerySet into Paginator, and/or a Serializer, the result has ALL columns in it.
I cannot use .values_list() because that does not return the nested objects that I need to have serialized as part of my specific column ask. I am not sure what is happening to my .only()
db_result_object = model.objects.prefetch_related().filter(qs).order_by(asc+sort_by).only(*columns_to_return)
paginated_results = Paginator(db_result_object,results_per_page)
serialized_results = serializer(paginated_results.object_list,many=True)
paginated_results.object_list = serialized_results.data
return paginated_results
This one has tripped me up too. In Django, calling only() doesn't return data equivalent to a SQL statement like this:
SELECT col_to_return_1, ... col_to_return_n
FROM appname_model
The reason it doesn't do it like this is because Django returns data to you not when you construct the QuerySet, but when you first access data from that QuerySet (see lazy QuerySets).
In the case of only() (a specific example of what is called a deferred field) you still get all of the fields like you normally would, but the difference is that it isn't completely loaded in from the database immediately. When you access the data, it will only load the fields included in the only statement. Some useful docs here.
My recommendation would be to write your Serializer so that it is only taking care of the one specific filed, likely using a SerializerMethodField with another serializer to serialize your related fields.

Using django select_related with an additional filter

I'm trying to find an optimal way to execute a query, but got myself confused with the prefetch_related and select_related use cases.
I have a 3 table foreign key relationship: A -> has 1-many B h-> as 1-many C.
class A(models.model):
...
class B(models.model):
a = models.ForeignKey(A)
class C(models.model):
b = models.ForeignKey(B)
data = models.TextField(max_length=50)
I'm trying to get a list of all C.data for all instances of A that match a criteria (an instance of A and all its children), so I have something like this:
qs1 = A.objects.all().filter(Q(id=12345)|Q(parent_id=12345))
qs2 = C.objects.select_related('B__A').filter(B__A__in=qs1)
But I'm wary of the (Prefetch docs stating that:
any subsequent chained methods which imply a different database query
will ignore previously cached results, and retrieve data using a fresh
database query
I don't know if that applies here (because I'm using select_related), but reading it makes it seem as if anything gained from doing select_related is lost as soon as I do the filter.
Is my two-part query as optimal as it can be? I don't think I need prefetch as far as I'm aware, although I noticed I can swap out select_related with prefetch_related and get the same result.
I think your question is driven by a misconception. select_related (and prefetch_related) are an optimisation, specifically for returning values in related models along with the original query. They are never required.
What's more, neither has any impact at all on filter. Django will automatically do the relevant joins and subqueries in order to make your query, whether or not you use select_related.

Get object from list of objects without extra database calls - Django

I have an import of objects where I want to check against the database if it has already been imported earlier, if it has I will update it, if not I will create a new one. But what is the best way of doing this.
Right now I have this:
old_books = Book.objects.filter(foreign_source="import")
for book in new_books:
try:
old_book = old_books.get(id=book.id):
#update book
except:
#create book
But that creates a database call for each book in new_books. So I am looking for a way where it will only make one call to the database, and then just fetch objects from that queryset.
Ps: not looking for a get_or_create kind of thing as the update and create functions are more complex than that :)
--- EDIT---
I guess I haven't been good enough in my explanation, as the answers does not reflect what the problem is. So to make it more clear (I hope):
I want to pick out a single object from a queryset, based on an id of that object. I want the full object so I can update it and save it with it's changed values. So lets say I have a queryset with 3 objects, A and B and C. Then I want a way to ask if the queryset has object B and if it has then get it, without an extra database call.
Assuming new_books is another queryset of Book you can try filter on id of it as
old_books = Book.objects.filter(foreign_source="import").filter(id__in=[b.id for b in new_books])
With this old_books has books that are already created.
You can use the values_list('id', flat=True) to get all ids in a single DB call (is much faster than querysets). Then you can use sets to find the intersections.
new_book_ids = new_books.values_list('id', flat=True)
old_book_ids = Book.objects.filter(foreign_source="import") \
.values_list('id', flat=True)
to_update_ids = set(new_book_ids) & set(old_book_ids)
to_create_ids = set(new_book_ids) - to_update_ids
-- EDIT (to include the updated part) --
I guess the problem you are facing is in bulk updating rather than bulk fetch.
If the updates are simple, then something like this might work:
old_book_ids = Book.objects.filter(foreign_source="import") \
.values_list('id', flat=True)
to_update = []
to_create = []
for book in new_books:
if book.id in old_book_ids:
# list of books to update
# to_update.append(book.id)
else:
# create a book object
# Book(**details)
# Update books
Book.objects.filter(id__in=to_update).update(field='new_value')
Book.objects.bulk_create(to_create)
But if the updates are complex (update fields are dependent upon related fields), then you can check insert... on duplicated key update option in MySQL and its custom manager for Django.
Please leave a comment if the above is completely off the track.
You'll have to do more than one query. You need two groups of objects, you can't fetch them both and split them up at the same time arbitrarily like that. There's no bulk_get_or_create method.
However, the example code you've given will do a query for every object which really isn't very efficient (or djangoic for that matter). Instead, use the __in clause to create smart subqueries, and then you can limit database hits to only two queries:
old_to_update = Book.objects.filter(foreign_source="import", pk__in=new_books)
old_to_create = Book.objects.filter(foreign_source="import").exclude(pk__in=new_books)
Django is smart enough to know how to use that new_books queryset in that context (it can also be a regular list of ids)
update
Queryset objects are just a sort of list of objects. So all you need to do now is loop over the objects:
for book in old_to_update:
#update book
for book in old_to_create:
#create book
At this point, when it's fetching the books from the QuerySet, not from the databse, which is a lot more efficient than using .get() for each and every one of them - and you get the same result. each iteration you get to work with an object, the same as if you got it from a direct .get() call.
The best solution I have found is using the python next() function.
First evaluate the queryset into a set and then pick the book you need with next:
old_books = set(Book.objects.filter(foreign_source="import"))
old_book = next((book for book in existing_books if book.id == new_book.id), None )
That way the database is not queried everytime you need to get a specific book from the queryset. And then you can just do:
if old_book:
#update book
old_book.save()
else:
#create new book
In Django 1.7 there is an update_or_create() method that might solve this problem in a better way: https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.update_or_create

How to update a queryset that has been annotated?

I have the following models:
class Work(models.Model):
visible = models.BooleanField(default=False)
class Book(models.Model):
work = models.ForeignKey('Work')
I am attempting to update some rows like so:
qs=Work.objects.all()
qs.annotate(Count('book')).filter(Q(book__count__gt=1)).update(visible=False)
However, this is giving an error:
DatabaseError: subquery has too many columns
LINE 1: ...SET "visible" = false WHERE "app_work"."id" IN (SELECT...
If I remove the update clause, the query runs with no problems and returns what I am expecting.
It looks like this error happens for queries with an annotate followed by an update. Is there some other way to write this?
Without making a toy database to be able to duplicate your issue and try out solutions, I can at least suggest the approach in Django: Getting complement of queryset as one possible approach.
Try this approach:
qs.annotate(Count('book')).filter(Q(book__count__gt=1))
Work.objects.filter(pk__in=qs.values_list('pk', flat=True)).update(visible=False)
You can also clear the annotations off a queryset quite simply:
qs.query.annotations.clear()
qs.update(..)
And this means you're only firing off one query, not one into another, but don't use this if your query relies on an annotation to filter. This is great for stripping out database-generated concatenations, and the utility rubbish that I occasionally add into model's default queries... but the example in the question is a perfect example of where this would not work.
To add to Oli's answer: If you need your annotations for the update then do the filters first and store the result in a variable and then call filter with no arguments on that queryset to access the update function like so:
q = X.objects.filter(annotated_val=5, annotated_name='Nima')
q.query.annotations.clear()
q.filter().update(field=900)
I've duplicated this issue & believe its a bug with the Django ORM. #acjay answer is a good workaround. Bug report: https://code.djangoproject.com/ticket/25171
Fix released in Django 2 alpha: https://code.djangoproject.com/ticket/19513

What is a Django QuerySet?

When I do this,
>>> b = Blog.objects.all()
>>> b
I get this:
>>>[<Blog: Blog Title>,<Blog: Blog Tile>]
When I query what type b is,
>>> type(b)
I get this:
>>> <class 'django.db.models.query.QuerySet'>
What does this mean? Is it a data type like dict, list, etc?
An example of how I can build data structure like a QuerySet will be appreciated.
I would want to know how Django builds that QuerySet (the gory details).
A django queryset is like its name says, basically a collection of (sql) queries, in your example above print(b.query) will show you the sql query generated from your django filter calls.
Since querysets are lazy, the database query isn't done immediately, but only when needed - when the queryset is evaluated. This happens for example if you call its __str__ method when you print it, if you would call list() on it, or, what happens mostly, you iterate over it (for post in b..). This lazyness should save you from doing unnecessary queries and also allows you to chain querysets and filters for example (you can filter a queryset as often as you want to).
Yes, it's just another type, built like every other type.
A QuerySet represents a collection of objects from your database. It can have zero, one or many filters. Filters narrow down the query results based on the given parameters. In SQL terms, a QuerySet equates to a SELECT statement, and a filter is a limiting clause such as WHERE or LIMIT.
https://docs.djangoproject.com/en/1.8/topics/db/queries/
A QuerySet is a list of objects of a given model, QuerySet allow you to read data from database