Quick database update based on previous value in Django - django

I have a QuerySet which contains Page objects, which have page_number attribute. The QuerySet is sorted by page_number, so running [i.page_number for i in pages_query_set] would return something like, for example, [1,2,3,5,6,10,11,13,16,19,21]. My task is to write a method that moves the Page objects so they would be consecutive: in this example, the page that was 5 would become 4, 6 would become 5, 10 would be 6, 11 is 7, 13 is 8, etc.
This is my initial solution, a method inside PageQuerySet class:
def validatePageNumbers(self):
prev_page = 0
for page in self:
if page.page_number > prev_page+1:
page.page_number = prev_page+1
page.save()
prev_page = page.page_number
Functionally, it works fine. But calling save() every time really slows it down (probably due to calling database queries every time). I need to find a faster approach. If there had been only one "gap" in this sequence, I would just slice the QuerySet and use something like sliced_qs.update(page_number=models.F('page_number')-gap), because I've seen a singular update() being much faster than several save(). But the gaps are multiple and pretty random.
So I'm confused. F objects don't seem to support such looping. It would be great if I could use a callable in update(), but I haven't found any information about that in docs, nor it works when I try it. Is there a way to apply update() here? Or maybe some other way to make this method faster?

The solution to this and numerous other bottlenecks was quite simple.
from django.db import transaction
with transaction.atomic():
#do stuff here
Guess it wraps it all into a single transaction and hits the database only once. This answer has helped a lot here.

Related

Updating and fetching a Django model object atomically

I want a capability to update-and-read an object atomically. For example, something like a first() on top of update() below:
obj = MyModel.objects.filter(pk=100).update(counter=F('counter') + 1).first()
I know it is awkward construct. But just want to show my need.
For the record I have used class method like:
#classmethod
def update_counter(cls, job_id):
with transaction.atomic():
job = (cls.objects.select_for_update().get(pk=job_id))
job.counter += 1
job.save()
return job
where I would call as below and get my updated obj.
my_obj = my_obj.update_counter()
But the question is, is there any other django model technique given such read back are common and likely used by multiple threads to conclude something based on, say, the final count.
Digging deeper I could not find any direct way of getting the object(s) that I am updating in an sql chained way. As Dani Herrera commented above the update and read have to be two sql queries. Only mechanism that helps me with that requirement is therefore the class method I had also included above. In fact, it helps me to add additional field updates in the same class method atomically in future.
For example, the method could very well be "def update_progress(job_id, final_desired_count)" where I can update more fields such as "self.progress_percentage = (self.counter / final_desired_count) * 100".
The class method approach may turn out to be a good investment for me.

Using django prefetch_related() to get the time of last activity

I upgraded to Django 1.7 so I could get Prefetch objects, but I'm having a hard time getting them to behave as expected.
I have an Employee model like this:
class Employee(Human):
... additional Employee Fields ...
def get_last_activity_date(self):
try:
return self.activity_set.all().order_by('-when')[0:1].get().when
except Activity.DoesNotExist:
return None
and activities like this
class Activity(models.Model):
when = models.DateTimeField()
employee = models.ForeignKey(Employee, related_name='activity_set')
I want to use prefetch_related to get the last activity date for this employee. I've tried to express this many ways, but no matter how I do it, it ends up generating another query. My other 2 my prefetch_related parts work as expected, but this one does not ever seem to save me any queries.
I'm using this with Django Rest Framework, so I really need the prefetch_related part to work since I have no way of reaching inside DRF to do the mapping outside of the queryset.
Here is one of the ways that DOES NOT WORK
def get_queryset(self):
return super(EmployeeViewSet, self).get_queryset()\
.prefetch_related('phone_number_set', 'email_address_set')\
.prefetch_related(Prefetch('activity_set', Activity.objects.all().order_by('-when')))\
.order_by('last_name', 'first_name')
Notice that on the activity_set prefetch query that I can't slice to only get the latest entry either which is a concern in terms of how much memory this is going to eat up.
I do actually see the prefetch query take place, but then each employee gets a separate query for that piece of information, meaning I have one bigger wasted query and still get the ~200 querys I'm trying to prevent.
How do you get the prefetch_related to work for me in this case?
I suspect You are missing the point of prefetch_related. The docs state that this is the expected behavior: many queries and the 'joining' in python. If you want less queries you should use select_related Also I'm not sure It would work with your specific models (not stated in the question) since select_related does work well with many to many relations.
UPDATE - the docs:
prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python

Django pagination random: order_by('?')

I am loving Django, and liking its implemented pagination functionality. However, I encounter issues when attempting to split a randomly ordered queryset across multiple pages.
For example, I have 100 elements in a queryset, and wish to display them 25 at a time. Providing the context object as a queryset ordered randomly (with the .order_by('?') specification), a completely new queryset is loaded into the context each time a new page is requested (page 2, 3, 4).
Explicitly stated: how do I (or can I) request a single queryset, randomly ordered, and display it across digestible pages?
I ran into the same problem recently where I didn't want to have to cache all the results.
What I did to resolve this was a combination of .extra() and raw().
This is what it looks like:
raw_sql = str(queryset.extra(select={'sort_key': 'random()'})
.order_by('sort_key').query)
set_seed = "SELECT setseed(%s);" % float(random_seed)
queryset = self.model.objects.raw(set_seed + raw_sql)
I believe this will only work for postgres. Doing a similar thing in MySQL is probably simpler since you can pass the seed directly to RAND(123).
The seed can be stored in the session/a cookie/your frontend in the case of ajax calls.
Warning - There is a better way
This is actually a very slow operation. I found this blog post describes a very good method both for retrieving a single result as well as sets of results.
In this case the seed will be used in your local random number generator.
i think this really good answer will be useful to you: How to have a "random" order on a set of objects with paging in Django?
basically he suggests to cache the list of objects and refer to it with a session variable, so it can be maintained between the pages (using django pagination).
or you could manually randomize the list and pass a seed to maintain the randomification for the same user!
The best way to achive this is to use some pagination APP like:
pure-pagination
django-pagination
django-infinite-pagination
Personally i use the first one, it integrates pretty well with Haystack.
""" EXAMPLE: (django-pagination) """
#paginate 10 results.
{% autopaginate my_query 10 %}

Passing limited queryset to related_to() in django-tagging

I want to use the related_to() function in django-tagging and I am passing a queryset looking like this:
Chapter.objects.all().order_by('?')[:5] #the important thing here is the "[:5]"
My problem is that this function apparently uses the in_bunk() function and you Cannot use 'limit' or 'offset' with in_bulk
How can I restrict my queryset to only pass 5 objects and at the same time make use of in_bunk?
I know that related_to() lets you pass the variable num (which is the number of objects it should return) but I don't want it to output the same queryset every single time. So i came up with the idea of ordering it randomly and limiting it before it was passed to the function. But as you can see: limited querysets and bunk_it doesn't go hand in hand very well.
I found a solution though it wasn't the best and though it processes unnecessary data. I simply run through all instances of the model related to the current instance and I then sort randomly and slice afterwards:
related_objects = instance.related_to(Model) # all related objects are found
related_objects = random.sample(related_objects,5) # .. and afterwards sorted randomly and sliced

Django database query - return the most recent three objects

This can't be hard, but... I just need to get the most recent three objects added to my database field.
So, query with reverse ID ordering, maximum three objects.
Been fiddling round with
Records.objects.order_by(-id)[:3]
Records.objects.all[:3]
and including an if clause to check whether there are actually three objects:
num_maps = Records.objects.count()
if (num_maps > 3): # etc...
and using reverse() and filter() for a while...
But just can't figure it out! Nothing I do gives the right result and using num_maps feels pretty inelegant. Not getting much joy from the documentation. Can anyone help?!
All you should need is:
Records.objects.all().order_by('-id')[:3]
You need the all() first before the order_by and the argument you pass into order_by should be a string. No need to check if there are actually 3 before running this query because the [:3] will not break if there are less than 3.