Prevent repeating query within SerializerMethodField s

Prevent repeating query within SerializerMethodField s - django

So i got my serializer called like this:
result_serializer = TaskInfoSerializer(tasks, many=True)
And the serializer:
class TaskInfoSerializer(serializers.ModelSerializer):
done_jobs_count = serializers.SerializerMethodField()
total_jobs_count = serializers.SerializerMethodField()
task_status = serializers.SerializerMethodField()
class Meta:
model = Task
fields = ('task_id', 'task_name', 'done_jobs_count', 'total_jobs_count', 'task_status')
def get_done_jobs_count(self, obj):
qs = Job.objects.filter(task__task_id=obj.task_id, done_flag=1)
condition = False
# Some complicate logic to determine condition that I can't reveal due to business
result = qs.count() if condition else 0
# this function take around 3 seconds
return result
def get_total_jobs_count(self, obj):
qs = Job.objects.filter(task__task_id=obj.task_id)
# this query take around 3-5 seconds
return qs.count()
def get_task_status(self, obj):
done_count = self.get_done_jobs_count(obj)
total_count = self.get_total_jobs_count(obj)
if done_count >= total_count:
return 'done'
else:
return 'not yet'
When the get_task_status function is called, it call other 2 function and make those 2 costly query again.
Is there any best way to prevent that? And I dont really know the order of those functions to be called, is it based on the order declare in Meta's fields? Or above that?
Edit:
The logic in get_done_jobs_count is a bit complicate and I cannot make it into a single query when get task
Edit 2:
I just bring all those count function into model and use cached_property
https://docs.djangoproject.com/en/2.1/ref/utils/#module-django.utils.functional
But it raise another question: Is that number reliable? I don't understand much about django cache, is that cached_property is only exist for this instance (just until the API get list of tasks return a response) or will it exist for sometime?

I just try cached_property and it did resolve the problem.
Model:
from django.utils.functional import cached_property
from django.db import models
class Task(models.Model):
task_id = models.AutoField(primary_key=True)
task_name = models.CharField(default='')
#cached_property
def done_jobs_count(self):
qs = self.jobs.filter(done_flag=1)
condition = False
# Some complicate logic to determine condition that I can't reveal due to business
result = qs.count() if condition else 0
# this function take around 3 seconds
return result
#cached_property
def total_jobs_count(self):
qs = Job.objects.filter(task__task_id=obj.task_id)
# this query take around 3-5 seconds
return qs.count()
#property
def task_status(self):
done_count = self.done_jobs_count
total_count = self.total_jobs_count
if done_count >= total_count:
return 'done'
else:
return 'not yet'
Serializer:
class TaskInfoSerializer(serializers.ModelSerializer):
class Meta:
model = Task
fields = ('task_id', 'task_name', 'done_jobs_count', 'total_jobs_count', 'task_status')

You could annotate those values to avoid making extra queries. So the queryset passed to the serializer would look something like this (it might change depending on the Django version you're using and the related query name for jobs):
tasks = tasks.annotate(
done_jobs=Count('jobs', filter=Q(done_flag=1)),
total_jobs=Count('jobs'),
)
result_serializer = TaskInfoSerializer(tasks, many=True)
Then the serializer method would look like:
def get_task_status(self, obj):
if obj.done_jobs >= obj.total_jobs:
return 'done'
else:
return 'not yet'
Edit: a cached_property won't help you if you have to call the method for each task instance (which seems the case). The problem is not so much the calculation but whether you have to hit the database for each separate task. You have to focus on getting all the information needed for the calculation in a single query. If that's impossible or too complicated, maybe think about changing the data structure (models) in order to facilitate that.

Using iterator() and counting Iterator might solve your problem.
job_iter = Job.objects.filter(task__task_id=obj.task_id).iterator()
count = len(list(job_iter))
return count
You can use select_related() and prefetch_related() for Retrieve everything at once if you will need them.
Note: if you use iterator() to run the query, prefetch_related() calls will be ignored
You might want to go through documentation for optimisation

Related

Django Rest Framework: optimize nester serializers performance

I have problem with my endpoint performance which returns around 40 items and response takes around 17 seconds.
I have model:
class GameTask(models.Model):
name= models.CharField()
description = RichTextUploadingField()
...
and another model like that:
class TaskLevel(models.Model):
master_task = models.ForeignKey(GameTask, related_name="sub_levels", on_delete-models.CASCADE)
sub_tasks = models.ManyToManyField(GameTask, related_name="master_levels")
...
So basicly I can have "normal" tasks, but when I create TaskLevel object I can add master_task as a task which gonna fasten other tasks added to sub_tasks field.
My serializers look like:
class TaskBaseSerializer(serializers.ModelSerializer):
fields created by serializers.SerializerMethodField()
...
class TaskLevelSerializer(serializers.ModelSerializer):
sub_tasks = serializers.SerializerMethodField()
class Meta:
model = TaskLevel
def get_sub_tasks(self, obj: TaskLevel):
sub_tasks = get_sub_tasks(level=obj, user=self.context["request"].user) # method from other module
return TaskBaseSerializer(sub_tasks, many=True, context=self.context).data
class TaskSerializer(TaskBaseSerializer):
levels_config = serializers.SerializerMethodField()
class Meta:
model = GameTask
def get_levels_config(self, obj: GameTask):
if is_mastertask(obj):
return dict(
sub_levels=TaskLevelSerializer(
obj.sub_levels.all().order_by("number"), many=True, context=self.context
).data,
progress=get_progress(
master_task=obj, user=self.context["request"].user
),
)
return None
When I tried to measure time it turned out that get_levels_config method takes around 0.25 seconds for one multilevel-task (which contain 7 subtasks). Is there any way to improve this performance? If any more detailed methods are needed I will add them

Your code might be suffering from N+1 problem. TaskSerializer.get_levels_config() performs database queries from obj.sub_levels.all().order_by("number").
What happens when serializing multiple instances like:
TaskSerializer(tasks, many=True)
each instance calls .get_levels_config()
You can use prefetch_related & selected_related(more explanation here).
You will have to manually check for prefetched objects since you are using SerializerMethodField. There's also the functions get_progress & get_sub_tasks which I assume does another query.
Here are some examples that can be used around your code:
Prefetching:
GameTask.objects.prefetch_related("sub_levels", "master_levels")
# Accessing deeper level
GameTask.objects.prefetch_related(
"sub_levels",
"master_levels",
"sub_levels__sub_tasks",
).select_related(
"master_levels__master_task",
)
Checking prefetch:
def get_sub_tasks(self, obj: TaskLevel):
if hasattr(obj, "_prefetched_objects_cache") and obj._prefetched_objects_cache.get("sub_tasks", None):
return obj._prefetched_objects_cache
return obj.sub_tasks.all()

Django newbie, struggling to understand how to implement a custom queryset

So I'm pretty new to Django, I started playing yesterday and have been playing with the standard polls tutorial.
Context
I'd like to be able to filter the active questions based on the results of a custom method (in this case it is the Question.is_open() method (fig1 below).
The problem as I understand it
When I try and access only the active questions using a filter like
questions.objects.filter(is_open=true) it fails. If I understand correctly this relies on a queryset exposed via a model manager which can only filter based on records within the sql database.
My questions
1) Am I approaching this problem in most pythonic/django/dry way ? Should I be exposing these methods by subclassing the models.Manager and generating a custom queryset ? (that appears to be the consensus online).
2) If I should be using a manager subclass with a custom queryset, i'm not sure what the code would look like. For example, should I be using sql via a cursor.execute (as per the documentation here, which seems very low level) ? Or is there a better, higher level way of achieving this in django itself ?
I'd appreciate any insights into how to approach this.
Thanks
Matt
My models.py
class Question(models.Model):
question_text = models.CharField(max_length=200)
pub_date = models.DateTimeField('date published',default=timezone.now())
start_date = models.DateTimeField('poll start date',default=timezone.now())
closed_date = models.DateTimeField('poll close date', default=timezone.now() + datetime.timedelta(days=1))
def time_now(self):
return timezone.now()
def was_published_recently(self):
return self.pub_date >= timezone.now() - datetime.timedelta(days=1)
def is_open(self):
return ((timezone.now() > self.start_date) and (timezone.now() < self.closed_date))
def was_opened_recently(self):
return self.start_date >= timezone.now() - datetime.timedelta(days=1) and self.is_open()
def was_closed_recently(self):
return self.closed_date >= timezone.now() - datetime.timedelta(days=1) and not self.is_open()
def is_opening_soon(self):
return self.start_date <= timezone.now() - datetime.timedelta(days=1)
def closing_soon(self):
return self.closed_date <= timezone.now() - datetime.timedelta(days=1)
[Update]
Just as a follow-up. I've subclassed the default manager with a hardcoded SQL string (just for testing), however, it fails as it's not an attribute
class QuestionManager(models.Manager):
def get_queryset(self):
return super().get_queryset()
def get_expired(self):
from django.db import connection
with connection.cursor() as cursor:
cursor.execute("""
select id, question_text, closed_date, start_date, pub_date from polls_question
where ( polls_question.start_date < '2017-12-24 00:08') and (polls_question.closed_date > '2017-12-25 00:01')
order by pub_date;""")
result_list = []
for row in cursor.fetchall():
p = self.model(id=row[0], question=row[1], closed_date=row[2], start_date=row[3], pub_date=row[4])
result_list.append(p)
return result_list
I'm calling the method with active_poll_list = Question.objects.get_expired()
but I get the exception
Exception Value:
'Manager' object has no attribute 'get_expired'
I'm really not sure I understand why this doesn't work. It must be my misunderstanding of how I should invoke a method that returns a queryset from the manager.
Any suggestions would be much appreciated.
Thanks

There are so many things in your question and I'll try to cover as many as possible.
When you're trying to get a queryset for a model, you can use only the field attributes as lookups. That means in your example that you can do:
Question.objects.filter(question_text='What's the question?')
or:
Question.objects.filter(question_text__icontains='what')
But you can't query a method:
Question.objects.filter(is_open=True)
There is no field is_open. It is a method of the model class and it can't be used when filtering a queryset.
The methods you have declared in the class Question might be better decorated as properties (#property) or as cached properties. For the later import this:
from django.utils.functional import cached_property
and decorate the methods like this:
#cached_property
def is_open(self):
# ...
This will make the calculated value avaiable as property, not as method:
question = Question.objects.get(pk=1)
print(question.is_open)
When you specify default value for time fields you very probably want this:
pub_date = models.DateTimeField('date published', default=timezone.now)
Pay attention - it is just timezone.now! The callable should be called when an entry is created. Otherwise the method timezone.now() will be called the first time the django app starts and all entries will have that time saved.
If you want to add extra methods to the manager, you have to assign your custom manager to the objects:
class Question(models.Model):
# the fields ...
objects = QuestionManager()
After that the method get_expired will be available:
Question.objects.get_expired()
I hope this helps you to understand some things that went wrong in your code.

Looks like i'd missed off brackets when defining Question.objects
It's still not working, but I think I can figure it out from here.

AND search with reverse relations

I'm working on a django project with the following models.
class User(models.Model):
pass
class Item(models.Model):
user = models.ForeignKey(User)
item_id = models.IntegerField()
There are about 10 million items and 100 thousand users.
My goal is to override the default admin search that takes forever and
return all the matching users that own "all" of the specified item ids within a reasonable timeframe.
These are a couple of the tests I use to better illustrate my criteria.
class TestSearch(TestCase):
def search(self, searchterm):
"""A tuple is returned with the first element as the queryset"""
return do_admin_search(User.objects.all())
def test_return_matching_users(self):
user = User.objects.create()
Item.objects.create(item_id=12345, user=user)
Item.objects.create(item_id=67890, user=user)
result = self.search('12345 67890')
assert_equal(1, result[0].count())
assert_equal(user, result[0][0])
def test_exclude_users_that_do_not_match_1(self):
user = User.objects.create()
Item.objects.create(item_id=12345, user=user)
result = self.search('12345 67890')
assert_false(result[0].exists())
def test_exclude_users_that_do_not_match_2(self):
user = User.objects.create()
result = self.search('12345 67890')
assert_false(result[0].exists())
The following snippet is my best attempt using annotate that takes over 50 seconds.
def search_by_item_ids(queryset, item_ids):
params = {}
for i in item_ids:
cond = Case(When(item__item_id=i, then=True), output_field=BooleanField())
params['has_' + str(i)] = cond
queryset = queryset.annotate(**params)
params = {}
for i in item_ids:
params['has_' + str(i)] = True
queryset = queryset.filter(**params)
return queryset
Is there anything I can do to speed it up?

Here's some quick suggestions that should improve performance drastically.
Use prefetch_related` on the initial queryset to get related items
queryset = User.objects.filter(...).prefetch_related('user_set')
Filter with the __in operator instead of looping through a list of IDs
def search_by_item_ids(queryset, item_ids):
return queryset.filter(item__item_id__in=item_ids)
Don't annotate if it's already a condition of the query
Since you know that this queryset only consists of records with ids in the item_ids list, no need to write that per object.
Putting it all together
You can speed up what you are doing drastically just by calling -
queryset = User.objects.filter(
item__item_id__in=item_ids
).prefetch_related('user_set')
with only 2 db hits for the full query.

Django queryset with Model method containing another queryset

Suppose I have a model, MyModel, with a property method that uses another model's queryset.
class OtherModel(models.Model)
...
class MyModel(models.Model):
simple_attr = models.CharField('Yada yada')
#property
def complex_attr(self):
list_other_model = OtherModel.objects.all()
...
# Complex algorithm using queryset from 'OtherModel' and simple_attr
return result
This causes my get_queryset() method on MyModel to query the database to generate the list_other_model variable every time for every single row.
Which causes my MyModel ListView to generate hundreds of SQL queries. Not efficient.
How can I architect a Manager or get_queryset method to cache the variable list_other_model for each row when using MyModel.objects.all()?
I hope my question makes sense--I'm on my sixth shot of espresso, and still haven't found a way to reduce the db queries.

Not sure if this is the best way to do it, but it works.
If someone posts a better answer, I'll accept theirs.
class OtherModel(models.Model)
...
class MyModelManager(models.Manager):
def get_queryset(self):
self.model.list_other_model = OtherModel.objects.all()
return super(MyModelManager, self).get_queryset()
class MyModel(models.Model):
simple_attr = models.CharField('Yada yada')
list_other_model = None
objects = MyModelManager()
#property
def complex_attr(self):
...
# Complex algorithm using queryset from 'OtherModel' and simple_attr
return result

Django API REST return all objects in a model

I am working with django-rest-framework and I have an API that returns me the info with a filter like this:
http://example.com/api/products?category=clothing&in_stock=True
--this returns me 10 items
But it also returns the whole Model data if I dont put the filters, this is the default way.
http://example.com/api/products/
--this returns me more than 100 (all the Model Table)
How can I disable this default operation, I mean, how can I make a filter to be necesary to make this api works? or even better! how can I make the last URL to return an empty json response?
UPDATE
Here is some code:
serializers.py
class OEntradaDetalleSerializer(serializers.HyperlinkedModelSerializer):
item = serializers.RelatedField(source='producto.item')
descripcion = serializers.RelatedField(source='producto.descripcion')
unidad = serializers.RelatedField(source='producto.unidad')
class Meta:
model = OEntradaDetalle
fields = ('url','item','descripcion','unidad','cantidad_ordenada','cantidad_recibida','epc')
views.py
class OEntradaDetalleViewSet(BulkUpdateModelMixin,viewsets.ModelViewSet):
filter_backends = (filters.DjangoFilterBackend,)
filter_fields = ('cantidad_ordenada','cantidad_recibida','oentrada__codigo_proveedor','oentrada__folio')
queryset = OEntradaDetalle.objects.all()
serializer_class = OEntradaDetalleSerializer
urls.py
router2 = BulkUpdateRouter()
router2.register(r'oentradadetalle', OEntradaDetalleViewSet)
urlpatterns = patterns('',
url(r'^api/',include(router2.urls)),
)
URL EXAMPLE
http://localhost:8000/api/oentradadetalle/?oentrada__folio=E01
THIS RETURNS ONLY SOME FILTERED VALUES
http://localhost:8000/api/oentradadetalle/
THIS RETURNS EVERYTHING IN THE MODEL (I need to remove this or make it return some empty data)

I would highly recommend using pagination, to prevent anyone from being able to return all of the results (which likely takes a while).
If you can spare the extra queries being made, you can always check if the filtered and unfiltered querysets match, and just return an empty queryset if that is the case. This would be done in the filter_queryset method on your view.
def filter_queryset(self, queryset):
filtered_queryset = super(ViewSet, self).filter_queryset(queryset)
if queryset.count() === len(filtered_queryset):
return queryset.model.objects.none()
return filtered_queryset
This will make one additional query for the count of the original queryset, and if it is the same as the filtered queryset, an empty queryset will be returned. If the queryset was actually filtered, it will be returned and the results will be what you are expecting.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Prevent repeating query within SerializerMethodField s - django

Related

Django Rest Framework: optimize nester serializers performance

Django newbie, struggling to understand how to implement a custom queryset

AND search with reverse relations

Django queryset with Model method containing another queryset

Django API REST return all objects in a model

Categories

Resources