How does drf serialize manytomany fields - django

How does DRF by default handle serializing a manytomany?
I see it defaults to render the field as an array of ids ex: [1,2,3]
And only uses 2 queries when I prefetch the related model.
However, when I generate it myself with .values_list('id', flat=True) it makes an extra query for every row.
Models
class Fails(models.Model):
runs = models.ManyToManyField(Runs, related_name='fails')
class Runs(models.Model):
name = models.TextField()
View
class FailsViewSet(viewsets.ModelViewSet):
...
def get_queryset(self):
...
return Fails.objects.filter(**params).prefetch_related('runs')
Serializer
class FailsSerializer(QueryFieldsMixin, serializers.ModelSerializer):
runs = serializers.SerializerMethodField()
def get_failbin_regressions(self, obj):
runids = self.context.get('runids')
return obj.runs.values_list('id', flat=True) #this creates an extra query for every row
The end goal is to get runs to display a filtered list of runids.
return obj.runs.values_list('id', flat=True).filter(id__in=runids)
or
runs = obj.runs.values_list('id', flat=True)
return [x for x in runs if x in runids] #to avoid an extra query from the .filter
I know the filter creates more queries, I assume the prefetch model is lost in the serializerMethodField.
Is there a way of getting the list of ids like drf does it without the extra query cost when I do it manually?
I can't find any documentation on how they implement the manytomany render.

By calling:
obj.runs.values_list('id', flat=True)
you are performing a new DB query. Since it will be called for every instance, you'll have a lot of extra queries.
prefetch_related loads the associated instances. So you can interact with the Python objects without extra queries. You could fix your issue with:
def get_failbin_regressions(self, obj):
runids = self.context.get('runids')
return [run.id for run in obj.runs.all() if run.id in runids]

Related

django update model while dropping extra kwargs

I have a Row class and a SpottedPub model. They are almost the same except that Row has a couple extra fields.
What i need to do is to gracefully update spottedpub fields from row object attributes. I tried this
spotted_pubs = SpottedPub.objects.filter(notification__rule__suite__campaign=self,
name=particular_row.name)
if spotted_pubs.all():
spotted_pubs.update(**row.__dict__)
but I get an error saying that I pass too many kwargs to update():
FieldDoesNotExist: SpottedPub has no field named 'profit_weight'
is there a way to drop kwargs that don't correspond to fields?
I tried writing a custom manager
class CustomSpottedPubManager(models.Manager):
def update_drop_fields(self, **fields):
queryset = super(CustomSpottedPubManager, self).get_queryset()
print(queryset)
# drop unwanted fields here
queryset.update(**fields)
and attaching it to the model
class SpottedPub(BaseUserObject):
objects = CustomSpottedPubManager()
but update_drop_fields() method doesn't get called because I access this method already after filtering:
filter(notification__rule__suite__campaign=self,
name=particular_row.name)
Your idea to create an update_drop_fields and filter the fields looks good.
You can create a queryset with the custom update_drop_fields() method,
class CustomSpottedPubQueryset(models.QuerySet):
def update_drop_fields(self, **fields):
return self.update(**fields)
then create a manager with the queryset methods.
class SpottedPub(BaseUserObject):
objects = CustomSpottedPubQueryset.as_manager()

Django Rest Framework Ordering on a SerializerMethodField

I have a Forum Topic model that I want to order on a computed SerializerMethodField, such as vote_count. Here are a very simplified Model, Serializer and ViewSet to show the issue:
# models.py
class Topic(models.Model):
"""
An individual discussion post in the forum
"""
title = models.CharField(max_length=60)
def vote_count(self):
"""
count the votes for the object
"""
return TopicVote.objects.filter(topic=self).count()
# serializers.py
class TopicSerializer(serializers.ModelSerializer):
vote_count = serializers.SerializerMethodField()
def get_vote_count(self, obj):
return obj.vote_count()
class Meta:
model = Topic
# views.py
class TopicViewSet(TopicMixin, viewsets.ModelViewSet):
queryset = Topic.objects.all()
serializer_class = TopicSerializer
Here is what works:
OrderingFilter is on by default and I can successfully order /topics?ordering=title
The vote_count function works perfectly
I'm trying to order by the MethodField on the TopicSerializer, vote_count like /topics?ordering=-vote_count but it seems that is not supported. Is there any way I can order by that field?
My simplified JSON response looks like this:
{
"id": 1,
"title": "first post",
"voteCount": 1
},
{
"id": 2,
"title": "second post",
"voteCount": 8
},
{
"id": 3,
"title": "third post",
"voteCount": 4
}
I'm using Ember to consume my API and the parser is turning it to camelCase. I've tried ordering=voteCount as well, but that doesn't work (and it shouldn't)
This is not possible using the default OrderingFilter, because the ordering is implemented on the database side. This is for efficiency reasons, as manually sorting the results can be incredibly slow and means breaking from a standard QuerySet. By keeping everything as a QuerySet, you benefit from the built-in filtering provided by Django REST framework (which generally expects a QuerySet) and the built-in pagination (which can be slow without one).
Now, you have two options in these cases: figure out how to retrieve your value on the database side, or try to minimize the performance hit you are going to have to take. Since the latter option is very implementation-specific, I'm going to skip it for now.
In this case, you can use the Count function provided by Django to do the count on the database side. This is provided as part of the aggregation API and works like the SQL COUNT function. You can do the equivalent Count call by modifying your queryset on the view to be
queryset = Topic.objects.annotate(vote_count=Count('topicvote_set'))
Replacing topicvote_set with your related_name for the field (you have one set, right?). This will allow you to order the results based on the number of votes, and even do filtering (if you want to) because it is available within the query itself.
This would require making a slight change to your serializer, so it pulls from the new vote_count property available on objects.
class TopicSerializer(serializers.ModelSerializer):
vote_count = serializers.IntegerField(read_only=True)
class Meta:
model = Topic
This will override your existing vote_count method, so you may want to rename the variable used when annotating (if you can't replace the old method).
Also, you can pass a method name as the source of a Django REST framework field and it will automatically call it. So technically your current serializer could just be
class TopicSerializer(serializers.ModelSerializer):
vote_count = serializers.IntegerField(read_only=True)
class Meta:
model = Topic
And it would work exactly like it currently does. Note that read_only is required in this case because a method is not the same as a property, so the value cannot be set.
Thanks #Kevin Brown for your great explanation and answer!
In my case I needed to sort a serializerMethodField called total_donation which is the sum of donations from the UserPayments table.
UserPayments has:
User as a foreignKey
sum which is an IntegerField
related_name='payments'
I needed to get the total donations per User but only donations that have a status of 'donated', not 'pending'. Also needed to filter out the payment_type coupon, which is related through two other foreign keys.
I was dumbfounded how to join and filter those donations and then be able to sort it via ordering_fields.
Thanks to your post I figured it out!
I realized it needed to be part of the original queryset in order to sort with ordering.
All I needed to do was annotate the queryset in my view, using Sum() with filters inside like so:
class DashboardUserListView(generics.ListAPIView):
donation_filter = Q(payments__status='donated') & ~Q(payments__payment_type__payment_type='coupon')
queryset = User.objects.annotate(total_donated=Sum('payments__sum', filter=donation_filter ))
serializer_class = DashboardUserListSerializer
pagination_class = DashboardUsersPagination
filter_backends = [filters.OrderingFilter]
ordering_fields = ['created', 'last_login', 'total_donated' ]
ordering = ['-created',]
I will put it here because the described case is not the only one.
The idea is to rewrite the list method of your Viewset to order by any of your SerializerMethodField(s) also without moving your logic from the Serializer to the ModelManager (especially when you work with several complex methods and/or related models)
def list(self, request, *args, **kwargs):
response = super().list(request, args, kwargs)
ordering = request.query_params.get('ordering')
if "-" in ordering:
response.data['results'] = sorted(response.data['results'], key=lambda k: (k[ordering.replace('-','')], ), reverse=True)
else:
response.data['results'] = sorted(response.data['results'], key=lambda k: (k[ordering], ))
return response

Is it possible to write a QuerySet method that modifies the dataset but delays evaluation (similar to prefetch_related)?

I'm working on a QuerySet class that does something similar to prefetch_related but allows the query to link data that's in an unconnected database (basically, linking records from django apps's database to records in a legacy system, using a shared unique key, something along the links of:
class UserFoo(models.Model):
''' Uses the django database & can link to User model '''
user = models.OneToOneField(User, related_name='userfoo')
foo_record = models.CharField(
max_length=32,
db_column="foo",
unique=True
) # uuid pointing to legacy db table
#property
def foo(self):
if not hasattr(self, '_foo'):
self._foo = Foo.objects.get(uuid=self.foo_record)
return self._foo
#foo.setter
def foo(self, foo_obj):
self._foo = foo_obj
and then
class Foo(models.Model):
'''Uses legacy database'''
id = models.AutoField(primary_key=True)
uuid = models.CharField(max_length=32) # uuid for Foo legacy db table
…
#property
def user(self):
if not hasattr(self, '_user'):
self._user = User.objects.get(userfoo__foo_record=self.uuid)
return self._user
#user.setter
def user(self, user_obj):
self._user = user_obj
Run normally, a query that matches 100 foos (each with, say, 1 user record) will end up requiring 101 queries: one to get the foos, and a hundred for each user record (by doing a look up for the user record by calling the user property on each food).
To get around this, I am making something similar to prefetch_related which pulls all of the matching records for a query by the key, which means I just need one additional query to get the remaining records.
My code looks something like this:
class FooWithUserQuerySet(models.query.QuerySet):
def with_foo(self):
qs = self._clone()
foo_idx = {}
for record in self.all():
foo_idx.setdefault(record.uuid, []).append(record)
users = User.objects.filter(
userfoo__foo_record__in=foo_idx.keys()
).select_related('django','relations','here')
user_idx = {}
for user in users:
user_idx[user.userfoo.foo_record] = user
for fid, frecords in foo_idx.items():
user = user_idx.get(fid)
for frecord in frecords:
if user:
setattr(frecord, 'user', user)
return qs
This works, but any extra data saved to a foo is lost if the query is later modified — that is, if the queryset is re-ordered or filtered in any way.
I would like a way to create a method that does exactly what I am doing now, but waits until the moment that adjusts whenever the query is evaluated, so that foo records always have a User record.
Some notes:
the example has been highly simplified. There are actually a lot of tables that link up to the legacy data, and so for example although there is a one-to-on relationship between Foo and User, there will be some cases where a queryset will have multiple Foo records with the same key.
the legacy database is on a different server and server platform, so I can't link the two tables using a database server itself
ideally I'd like the User data to be cached, so that even if the records are sorted or sliced I don't have to re-run the foo query a second time.
Basically, I don't know enough about the internals of how the lazy evaluation of querysets works in order to do the necessary coding. I have jumped back and forth on the source code for django.db.models.query but it really is a fairly dense read and I'm hoping someone out there who's worked with this already can offer some pointers.

Django queryset with Model method containing another queryset

Suppose I have a model, MyModel, with a property method that uses another model's queryset.
class OtherModel(models.Model)
...
class MyModel(models.Model):
simple_attr = models.CharField('Yada yada')
#property
def complex_attr(self):
list_other_model = OtherModel.objects.all()
...
# Complex algorithm using queryset from 'OtherModel' and simple_attr
return result
This causes my get_queryset() method on MyModel to query the database to generate the list_other_model variable every time for every single row.
Which causes my MyModel ListView to generate hundreds of SQL queries. Not efficient.
How can I architect a Manager or get_queryset method to cache the variable list_other_model for each row when using MyModel.objects.all()?
I hope my question makes sense--I'm on my sixth shot of espresso, and still haven't found a way to reduce the db queries.
Not sure if this is the best way to do it, but it works.
If someone posts a better answer, I'll accept theirs.
class OtherModel(models.Model)
...
class MyModelManager(models.Manager):
def get_queryset(self):
self.model.list_other_model = OtherModel.objects.all()
return super(MyModelManager, self).get_queryset()
class MyModel(models.Model):
simple_attr = models.CharField('Yada yada')
list_other_model = None
objects = MyModelManager()
#property
def complex_attr(self):
...
# Complex algorithm using queryset from 'OtherModel' and simple_attr
return result

Filter on a distinct field with TastyPie

Suppose I have a Person model that has a first name field and a last name field. There will be many people who have the same first name. I want to write a TastyPie resource that allows me to get a list of the unique first names (without duplicates).
Using the Django model directly, you can do this easily by saying something like Person.objects.values("first_name").distinct(). How do I achieve the same thing with TastyPie?
Update
I've adapted the apply_filters method linked below to use the values before making the distinct call.
def apply_filters(self, request, applicable_filters):
qs = self.get_object_list(request).filter(**applicable_filters)
values = request.GET.get('values', '').split(',')
if values:
qs = qs.values(*values)
distinct = request.GET.get('distinct', False) == 'True'
if distinct:
qs = qs.distinct()
return qs
values returns dictionaries instead of model objects, so I don't think you need to override alter_list_data_to_serialize.
Original response
There is a nice solution to the distinct part of the problem here involving a light override of apply_filters.
I'm surprised I'm not seeing a slick way to filter which fields are returned, but you could implement that by overriding alter_list_data_to_serialize and deleting unwanted fields off the objects just before serialization.
def alter_list_data_to_serialize(self, request, data):
data = super(PersonResource, self).alter_list_data_to_serialize(request, data)
fields = request.GET.get('fields', None)
if fields is not None:
fields = fields.split(',')
# Data might be a bundle here. If so, operate on data.objects instead.
data = [
dict((k,v) for k,v in d.items() if k in fields)
for d in data
]
return data
Combine those two to use something like /api/v1/person/?distinct=True&values=first_name to get what you're after. That would work generally and would still work with additional filtering (&last_name=Jones).