Django fixtures primary key error, need natural keys solution - django

So I have a Film model that holds a list of Actors model in a many to many field:
class Person(models.Model):
full = models.TextField()
short = models.TextField()
num = models.CharField(max_length=5)
class Film(models.Model):
name = models.TextField()
year = models.SmallIntegerField(blank=True)
actors = models.ManyToManyField('Person')
I'm trying to load some initial data from json fixtures, however the problem I have is loading the many to many actors field.
For example I get the error:
DeserializationError: [u"'Anna-Varney' value must be an integer."]
with these fixtures:
{
"pk": 1,
"model": "data.Film",
"fields": {
"actors": [
"Anna-Varney"
],
"name": "Like a Corpse Standing in Desperation (2005) (V)",
"year": "2005"
}
while my actors fixture looks like this:
{
"pk": 1,
"model": "data.Person",
"fields": {
"full": "Anna-Varney",
"num": "I",
"short": "Anna-Varney"
}
}
So the many to many fields must use the pk integer, but the problem is that the data isn't sorted and for a long list of actors I don't think its practical to manually look up the pk of each one. I've been looking for solutions and it seems I have to use natural keys, but I'm not exactly sure how to apply those for my models.
EDIT: I've changed my models to be:
class PersonManager(models.Manager):
def get_by_natural_key(self, full):
return self.get(full=full)
class Person(models.Model):
objects = PersonManager()
full = models.TextField()
short = models.TextField()
num = models.CharField(max_length=5)
def natural_key(self):
return self.full
But I'm still getting the same error

There's a problem with both the input and the natural_key method.
Documentation: Serializing Django objects - natural keys states:
A natural key is a tuple of values that can be used to uniquely
identify an object instance without using the primary key value.
The Person natural_key method should return a tuple
def natural_key(self):
return (self.full,)
The serialised input should also contain tuples/lists for the natural keys.
{
"pk": 1,
"model": "data.film",
"fields": {
"actors": [
[
"Matt Damon"
],
[
"Jodie Foster"
]
],
"name": "Elysium",
"year": 2013
}
}

Related

django rest serializer: ordering fields appearance

Is it possible to specify in which order fields will appear in a serialised model?
To make sure there is no confusion, while searching answers for this I have found lots of suggestions for ordering objects in a list view but this is not what I am looking for.
I really mean for a given model, I'd like their fields, once serialized to appear in a specific order. I have a fairly complex serialized object containing a lot of nested serializers, which appear first. I'd prefer instead key identifying fields such as name and slug to show up first, for readability.
Apologies in advance if this question is a duplicate, but I didn't find any relevant responses.
Solution
Based on #Toni-Sredanović solution I have implemented the following solution
def promote_fields(model: models.Model, *fields):
promoted_fields = list(fields)
other_fields = [field.name for field in model._meta.fields if field.name not in promoted_fields]
return promoted_fields + other_fields
class MySerializer(serializers.ModelSerializer):
...
class Meta:
model = MyModel
fields = promote_fields(model, 'id', 'field1', 'field2')
For that you can specify which fields you want to show and their order in class Meta:
class Meta:
fields = ('id', 'name', 'slug', 'field_1', 'field_2', ..., )
Here is a full example:
class TeamWithGamesSerializer(serializers.ModelSerializer):
"""
Team ModelSerializer with home and away games.
Home and away games are nested lists serialized with GameWithTeamNamesSerializer.
League is object serialized with LeagueSerializer instead of pk integer.
Current players is a nested list serialized with PlayerSerializer.
"""
league = LeagueSerializer(many=False, read_only=True)
home_games = GameWithTeamNamesSerializer(many=True, read_only=True)
away_games = GameWithTeamNamesSerializer(many=True, read_only=True)
current_players = PlayerSerializer(many=True, read_only=True)
class Meta:
model = Team
fields = ('id', 'name', 'head_coach', 'league', 'current_players', 'home_games', 'away_games', 'gender')
And the result:
{
"id": 1,
"name": "Glendale Desert Dogs",
"head_coach": "Coach Desert Dog",
"league": {
"id": 1,
"name": "Test league 1"
},
"current_players": [
{
"id": "rodriem02",
"first_name": "Emanuel",
"last_name": "Rodriguez",
"current_team": 1
},
{
"id": "ruthba01",
"first_name": "Babe",
"last_name": "Ruth",
"current_team": 1
}
],
"home_games": [
{
"id": 6,
"team_home": {
"id": 1,
"name": "Glendale Desert Dogs"
},
"team_away": {
"id": 2,
"name": "Mesa Solar Sox"
},
"status": "canceled",
"date": "2019-10-01"
},
{
"id": 7,
"team_home": {
"id": 1,
"name": "Glendale Desert Dogs"
},
"team_away": {
"id": 2,
"name": "Mesa Solar Sox"
},
"status": "",
"date": "2019-10-04"
}
],
"away_games": [
{
"id": 3,
"team_home": {
"id": 2,
"name": "Mesa Solar Sox"
},
"team_away": {
"id": 1,
"name": "Glendale Desert Dogs"
},
"status": "canceled",
"date": "2019-10-02"
}
],
"gender": "M"
}
If you would just use fields = '__all__' default ordering would be used which is:
object id
fields specified in the serializer
fields specified in the model
Best i can think of right now regarding your comment about generating fields is getting the fields in model, not really sure how to access what you've defined in the serializer so you would still need to write that manually.
Here is how you could do it with my example (this would make the name and gender appear on top):
class Meta:
model = Team
fields = ('name', 'gender')\
+ tuple([field.name for field in model._meta.fields if field.name not in ('name', 'gender')])\
+ ('league', 'home_games', 'away_games', 'current_players')

Prefetching indirectly related items using Django ORM

I'm trying to optimize the queries for my moderation system, build with Django and DRF.
I'm currently stuck with the duplicates retrieval: currently, I have something like
class AdminSerializer(ModelSerializer):
duplicates = SerializerMethodField()
def get_duplicates(self, item):
if item.allowed:
qs = []
else:
qs = Item.objects.filter(
allowed=True,
related_stuff__language=item.related_stuff.language
).annotate(
similarity=TrigramSimilarity('name', item.name)
).filter(similarity__gt=0.2).order_by('-similarity')[:10]
return AdminMinimalSerializer(qs, many=True).data
which works fine, but does at least one additional query for each item to display. In addition, if there are duplicates, I'll do additional queries to fill the AdminMinimalSerializer, which contains fields and related objects of the duplicated item. I can probably reduce the overhead by using a prefetch_related inside the serializer, but that doesn't prevent me from making several queries per item (assuming I have only one related item to prefetch in AdminMinimalSerializer, I'd still have ~2N + 1 queries: 1 for the items, N for the duplicates, N for the related items of the duplicates).
I've already looked at Subquery, but I can't retrieve an object, only an id, and this is not enough in my case. I tried to use it in both a Prefetch object and a .annotate.
I also tried something like Item.filter(allowed=False).prefetch(Prefetch("related_stuff__language__related_stuff_set__items", queryset=Items.filter..., to_attr="duplicates")), but the duplicates property is added to "related_stuff__language__related_stuff_set", so I can't really use it...
I'll welcome any idea ;)
Edit: the real code lives here. Toy example below:
# models.py
from django.db.models import Model, CharField, ForeignKey, CASCADE, BooleanField
class Book(Model):
title = CharField(max_length=250)
serie = ForeignKey(Serie, on_delete=CASCADE, related_name="books")
allowed = BooleanField(default=False)
class Serie(Model):
title = CharField(max_length=250)
language = ForeignKey(Language, on_delete=CASCADE, related_name="series")
class Language(Model):
name = CharField(max_length=100)
# serializers.py
from django.contrib.postgres.search import TrigramSimilarity
from rest_framework.serializers import ModelSerializer, SerializerMethodField
from .models import Book, Language, Serie
class BookAdminSerializer(ModelSerializer):
class Meta:
model = Book
fields = ("id", "title", "serie", "duplicates", )
serie = SerieAdminAuxSerializer()
duplicates = SerializerMethodField()
def get_duplicates(self, book):
"""Retrieve duplicates for book"""
if book.allowed:
qs = []
else:
qs = (
Book.objects.filter(
allowed=True, serie__language=book.serie.language)
.annotate(similarity=TrigramSimilarity("title", book.title))
.filter(similarity__gt=0.2)
.order_by("-similarity")[:10]
)
return BookAdminMinimalSerializer(qs, many=True).data
class BookAdminMinimalSerializer(ModelSerializer):
class Meta:
model = Book
fields = ("id", "title", "serie")
serie = SerieAdminAuxSerializer()
class SerieAdminAuxSerializer(ModelSerializer):
class Meta:
model = Serie
fields = ("id", "language", "title")
language = LanguageSerializer()
class LanguageSerializer(ModelSerializer):
class Meta:
model = Language
fields = ('id', 'name')
I'm trying to find a way to prefetch related objects and duplicates so that I can get rid of the get_duplicates method in BookSerializer, with the N+1 queries it causes, and have only a duplicates field in my BookSerializer.
Regarding data, here would be an expected output:
[
{
"id": 2,
"title": "test2",
"serie": {
"id": 2,
"language": {
"id": 1,
"name": "English"
},
"title": "series title"
},
"duplicates": [
{
"id": 1,
"title": "test",
"serie": {
"id": 1,
"language": {
"id": 1,
"name": "English"
},
"title": "first series title"
}
}
]
},
{
"id": 3,
"title": "random",
"serie": {
"id": 3,
"language": {
"id": 1,
"name": "English"
},
"title": "random series title"
},
"duplicates": []
}
]

Django-REST-Framework "GroupBy" ModelSerializer

I have the following situation
class MyModel(models.Model):
key = models.CharField(max_length=255)
value = models.TextField(max_length=255)
category = models.CharField(max_length=4)
mode = models.CharField(max_length=4)
the fields key, category and mode are unique together. I have the following objects:
m1 = MyModel(key='MODEL_KEY', value='1', category='CAT_1' mode='MODE_1')
m2 = MyModel(key='MODEL_KEY', value='2', category='CAT_1' mode='MODE_2')
m3 = MyModel(key='MODEL_KEY', value='1', category='CAT_2' mode='MODE_1')
m4 = MyModel(key='MODEL_KEY', value='2', category='CAT_2' mode='MODE_2')
I want to expose an API that will group by key and category so the serialized data will look something like this:
{
"key": "MODEL_KEY",
"category": "CAT_1"
"MODE_1": { "id": 1, "value": "1" }
"MODE_2": { "id": 2, "value": "2" }
},
{
"key": "MODEL_KEY",
"category": "CAT_2"
"MODE_1": { "id": 3, "value": "1" }
"MODE_2": { "id": 4, "value": "2" }
}
Is there any way of doing this in django rest framework with ModelSerializer.
There is module that allows you to group Django models and still work with a QuerySet in the result: https://github.com/kako-nawao/django-group-by
Using the above to form your queryset:
# Postgres specific!
from django.contrib.postgres.aggregates.general import ArrayAgg
qs = MyModel.objects.group_by('key', 'category').annotate(
mode_list=ArrayAgg('mode')).order_by(
'key', 'category').distinct()
You can then access the properties key, category and mode_list on the resulting QuerySet items as attributes like qs[0].mode_list. Therefore, in your serializer you can simply name them as fields.
The model_list field might require a SerializerMethodField with some custom code to transform the list.
Note that you need an aggregation if you don't want to group by mode, as well.

Django: Most efficient way to create a nested dictionary from querying related models?

In Django, what is the most efficient way to create a nested dictionary of data from querying related and child models?
For example, if I have the following models:
Parent
Children
Pets
I've seen django's model_to_dict method, and that's pretty cool, so I imagine I could loop through each level's queryset and create a bunch of DB calls on each level, for each instance, but is there a better way?
For example, could "prefetch_related" be used to get all three tiers as it is used to get two tiers here?
It would be great to get the dictionary to look something like this:
[
{
"name": "Peter Parent",
"children": [
{
"name": "Chaden Child",
"pets": [
{
"name": "Fanny",
"type:": "fish"
},
{
"name": "Buster",
"type:": "bunny"
}
]
},
{
"name": "Charlete Child",
"pets": [
{
"name": "Dandy",
"type:": "dog"
}
]
}
]
}
]
Edit:
By request this is what the models could look like:
class Pet(models.Model):
name = models.CharField(max_length=50)
type = models.CharField(max_length=50)
def __str__(self):
return self.name
class Child(models.Model):
name = models.CharField(max_length=50)
pets = models.ManyToManyField(Pet)
def __str__(self):
return self.name
class Parent(models.Model):
name = models.CharField(max_length=50)
children = models.ManyToManyField(Child)
def __str__(self):
return self.name
And this is what the raw sql would look like:
SELECT pa.name, ch.name, pe.name, pe.type
FROM aarc_parent pa
JOIN aarc_parent_children pc ON pc.parent_id = pa.id
JOIN aarc_child ch ON ch.id = pc.child_id
JOIN aarc_child_pets cp ON cp.child_id = ch.id
JOIN aarc_pet pe ON pe.id = cp.pet_id
You can use prefetch_related along with list comprehensions. prefetch_related will help in avoiding extra queries every time related object is accessed.
parents = Parent.objects.all().prefetch_related('children__pets')
[{'name': parent.name, 'children': [{'name': child.name, 'pets': [{'name':pet.name, 'type':pet.type} for pet in child.pets.all()]} for child in parent.children.all()]} for parent in parents]

Error loading django fixtures

I have two sets of fixtures, Person.json and Movies.json. The Person fixture basically have this format:
{
"pk": 1,
"model": "data.Person",
"fields": {
"full": "Anna-Varney",
"num": "I",
"short": "Anna-Varney"
}
},
And I load it in first, and it's fine no problem. My movie.json looks like this:
{
"pk": 1,
"model": "data.Film",
"fields": {
"date": "2005-08-01",
"rating": 8.3,
"actors": [
[
"Anna-Varney"
]
],
"name": "Like a Corpse Standing in Desperation (2005) (V)"
}
},
And loading the movies fixture in gives me this error:
DeserializationError: Problem installing fixture 'data/fixtures/movies.json': Person matching query does not exist.
My models are:
class PersonManager(models.Manager):
def get_by_natural_key(self, full):
return self.get(full=full)
class Person(models.Model):
objects = PersonManager()
full = models.CharField(max_length=100,unique = True)
short = models.CharField(max_length=100)
num = models.CharField(max_length=5)
def natural_key(self):
return (self.full,)
def __unicode__(self):
return self.full
class Film(models.Model):
name = models.TextField()
date = models.DateField()
rating = models.DecimalField(max_digits=3 , decimal_places=1)
actors = models.ManyToManyField('Person')
def __unicode__(self):
return self.name
I've loaded in similar models and fixtures in the past that worked, but I'm trying to refactor a bit of my code so now it doesn't work. One of the notable changes I've made was that I'm PostgreSQL instead of MySQL and that I'm running everything in virtualenv.
Is there a way to pinpoint where in the fixture that the error occurs?