Error loading django fixtures - django

I have two sets of fixtures, Person.json and Movies.json. The Person fixture basically have this format:
{
"pk": 1,
"model": "data.Person",
"fields": {
"full": "Anna-Varney",
"num": "I",
"short": "Anna-Varney"
}
},
And I load it in first, and it's fine no problem. My movie.json looks like this:
{
"pk": 1,
"model": "data.Film",
"fields": {
"date": "2005-08-01",
"rating": 8.3,
"actors": [
[
"Anna-Varney"
]
],
"name": "Like a Corpse Standing in Desperation (2005) (V)"
}
},
And loading the movies fixture in gives me this error:
DeserializationError: Problem installing fixture 'data/fixtures/movies.json': Person matching query does not exist.
My models are:
class PersonManager(models.Manager):
def get_by_natural_key(self, full):
return self.get(full=full)
class Person(models.Model):
objects = PersonManager()
full = models.CharField(max_length=100,unique = True)
short = models.CharField(max_length=100)
num = models.CharField(max_length=5)
def natural_key(self):
return (self.full,)
def __unicode__(self):
return self.full
class Film(models.Model):
name = models.TextField()
date = models.DateField()
rating = models.DecimalField(max_digits=3 , decimal_places=1)
actors = models.ManyToManyField('Person')
def __unicode__(self):
return self.name
I've loaded in similar models and fixtures in the past that worked, but I'm trying to refactor a bit of my code so now it doesn't work. One of the notable changes I've made was that I'm PostgreSQL instead of MySQL and that I'm running everything in virtualenv.
Is there a way to pinpoint where in the fixture that the error occurs?

Related

Django Complicated Query

I am using django restframework with my Django app and I need to create quite specific query.
Here is the models.py:
class TaskHours(models.Model):
name = models.CharField(max_length=100)
hours = models.FloatField()
task = models.CharField(max_length=100)
date = models.DateField()
views.py:
class TaskHoursView(generics.ListAPIView):
serializer_class = TaskHoursSerializer
queryset = TaskHours.objects.all()
def get_queryset(self):
start_date = self.request.query_params.get('start_date')
end_date = self.request.query_params.get('end_date')
return TaskHours.filter(date__range=[start_date, end_date])
and serializer is default one with class Meta with all fields.
This query is working fine but I need to alter it. In the data there are entries which have same name and same date, but different tasks. What I would need is get all the tasks and hours worked with the same name and date to one object like this:
{
"name": "John",
"date": "2021-04-14",
"task": "cleaning",
"hours": "4.5",
"task": "hoovering",
"hours": "2.0"
}
Now I am receiving it like this:
{
"name": "John",
"date": "2021-04-14",
"task": "cleaning",
"hours": "4.5",
},
{
"name": "John",
"date": "2021-04-14",
"task": "hoovering",
"hours": "2.0"
}
Is there any way how to merge the two objects into one?
You need to slightly modify your serializer in order to create a subquery for each object you're going to serialize.
class TaskHoursSerializer(serializers.ModelSerializer):
tasks = serializers.SerializerMethodField()
class Meta:
model = TaskHours
exclude = ['task', 'hour']
def get_tasks(self, obj):
tasks = TaskHours.objects.filter(name=obj.name, date=obj.date).values_list("task", "hour")
return list(tasks)
And also, you need to change the queryset in your view in order to not have object duplicates.
class TaskHoursView(generics.ListAPIView):
serializer_class = TaskHoursSerializer
queryset = TaskHours.objects.all()
def get_queryset(self):
start_date = self.request.query_params.get('start_date')
end_date = self.request.query_params.get('end_date')
return TaskHours.filter(date__range=[start_date, end_date]).values('name', 'date').distinct()
That should output a field called "tasks" containing each task with the matching hour.

Prefetching indirectly related items using Django ORM

I'm trying to optimize the queries for my moderation system, build with Django and DRF.
I'm currently stuck with the duplicates retrieval: currently, I have something like
class AdminSerializer(ModelSerializer):
duplicates = SerializerMethodField()
def get_duplicates(self, item):
if item.allowed:
qs = []
else:
qs = Item.objects.filter(
allowed=True,
related_stuff__language=item.related_stuff.language
).annotate(
similarity=TrigramSimilarity('name', item.name)
).filter(similarity__gt=0.2).order_by('-similarity')[:10]
return AdminMinimalSerializer(qs, many=True).data
which works fine, but does at least one additional query for each item to display. In addition, if there are duplicates, I'll do additional queries to fill the AdminMinimalSerializer, which contains fields and related objects of the duplicated item. I can probably reduce the overhead by using a prefetch_related inside the serializer, but that doesn't prevent me from making several queries per item (assuming I have only one related item to prefetch in AdminMinimalSerializer, I'd still have ~2N + 1 queries: 1 for the items, N for the duplicates, N for the related items of the duplicates).
I've already looked at Subquery, but I can't retrieve an object, only an id, and this is not enough in my case. I tried to use it in both a Prefetch object and a .annotate.
I also tried something like Item.filter(allowed=False).prefetch(Prefetch("related_stuff__language__related_stuff_set__items", queryset=Items.filter..., to_attr="duplicates")), but the duplicates property is added to "related_stuff__language__related_stuff_set", so I can't really use it...
I'll welcome any idea ;)
Edit: the real code lives here. Toy example below:
# models.py
from django.db.models import Model, CharField, ForeignKey, CASCADE, BooleanField
class Book(Model):
title = CharField(max_length=250)
serie = ForeignKey(Serie, on_delete=CASCADE, related_name="books")
allowed = BooleanField(default=False)
class Serie(Model):
title = CharField(max_length=250)
language = ForeignKey(Language, on_delete=CASCADE, related_name="series")
class Language(Model):
name = CharField(max_length=100)
# serializers.py
from django.contrib.postgres.search import TrigramSimilarity
from rest_framework.serializers import ModelSerializer, SerializerMethodField
from .models import Book, Language, Serie
class BookAdminSerializer(ModelSerializer):
class Meta:
model = Book
fields = ("id", "title", "serie", "duplicates", )
serie = SerieAdminAuxSerializer()
duplicates = SerializerMethodField()
def get_duplicates(self, book):
"""Retrieve duplicates for book"""
if book.allowed:
qs = []
else:
qs = (
Book.objects.filter(
allowed=True, serie__language=book.serie.language)
.annotate(similarity=TrigramSimilarity("title", book.title))
.filter(similarity__gt=0.2)
.order_by("-similarity")[:10]
)
return BookAdminMinimalSerializer(qs, many=True).data
class BookAdminMinimalSerializer(ModelSerializer):
class Meta:
model = Book
fields = ("id", "title", "serie")
serie = SerieAdminAuxSerializer()
class SerieAdminAuxSerializer(ModelSerializer):
class Meta:
model = Serie
fields = ("id", "language", "title")
language = LanguageSerializer()
class LanguageSerializer(ModelSerializer):
class Meta:
model = Language
fields = ('id', 'name')
I'm trying to find a way to prefetch related objects and duplicates so that I can get rid of the get_duplicates method in BookSerializer, with the N+1 queries it causes, and have only a duplicates field in my BookSerializer.
Regarding data, here would be an expected output:
[
{
"id": 2,
"title": "test2",
"serie": {
"id": 2,
"language": {
"id": 1,
"name": "English"
},
"title": "series title"
},
"duplicates": [
{
"id": 1,
"title": "test",
"serie": {
"id": 1,
"language": {
"id": 1,
"name": "English"
},
"title": "first series title"
}
}
]
},
{
"id": 3,
"title": "random",
"serie": {
"id": 3,
"language": {
"id": 1,
"name": "English"
},
"title": "random series title"
},
"duplicates": []
}
]

Django: Most efficient way to create a nested dictionary from querying related models?

In Django, what is the most efficient way to create a nested dictionary of data from querying related and child models?
For example, if I have the following models:
Parent
Children
Pets
I've seen django's model_to_dict method, and that's pretty cool, so I imagine I could loop through each level's queryset and create a bunch of DB calls on each level, for each instance, but is there a better way?
For example, could "prefetch_related" be used to get all three tiers as it is used to get two tiers here?
It would be great to get the dictionary to look something like this:
[
{
"name": "Peter Parent",
"children": [
{
"name": "Chaden Child",
"pets": [
{
"name": "Fanny",
"type:": "fish"
},
{
"name": "Buster",
"type:": "bunny"
}
]
},
{
"name": "Charlete Child",
"pets": [
{
"name": "Dandy",
"type:": "dog"
}
]
}
]
}
]
Edit:
By request this is what the models could look like:
class Pet(models.Model):
name = models.CharField(max_length=50)
type = models.CharField(max_length=50)
def __str__(self):
return self.name
class Child(models.Model):
name = models.CharField(max_length=50)
pets = models.ManyToManyField(Pet)
def __str__(self):
return self.name
class Parent(models.Model):
name = models.CharField(max_length=50)
children = models.ManyToManyField(Child)
def __str__(self):
return self.name
And this is what the raw sql would look like:
SELECT pa.name, ch.name, pe.name, pe.type
FROM aarc_parent pa
JOIN aarc_parent_children pc ON pc.parent_id = pa.id
JOIN aarc_child ch ON ch.id = pc.child_id
JOIN aarc_child_pets cp ON cp.child_id = ch.id
JOIN aarc_pet pe ON pe.id = cp.pet_id
You can use prefetch_related along with list comprehensions. prefetch_related will help in avoiding extra queries every time related object is accessed.
parents = Parent.objects.all().prefetch_related('children__pets')
[{'name': parent.name, 'children': [{'name': child.name, 'pets': [{'name':pet.name, 'type':pet.type} for pet in child.pets.all()]} for child in parent.children.all()]} for parent in parents]

how to filter nested related django objects

I have an app with lots of investors that invest in the same rounds, which belong to companies, as seen below. However when a user(investor) is logged in, i only want him to be able to see HIS investments.
{
"id": 1,
"name": "Technology Company",
"rounds": [
{
"id": 1,
"kind": "priced round",
"company": 1,
"investments": [
{
"id": 1,
"investor": 1,
"round": 1,
"size": 118000,
},
{
"id": 2,
"investor": 2,
"round": 1,
"size": 183000,
},
]
}
]
},
Currently, my viewsets extend get_queryset as so:
class CompanyViewSet(viewsets.ModelViewSet):
def get_queryset(self):
user = self.request.user
investor = Investor.objects.get(user=user)
companies = Company.objects.filter(rounds__investments__investor=investor)
return companies
It retrieves the investments that belong to the investor, but when it takes those investments again to retrieve the rounds, it grabs the round with ALL investors.
How can i write this such that it only displays the investments that below to the Investor?
Here are my models:
class Company(models.Model):
name = models.CharField(max_length=100)
class Round(PolymorphicModel):
company = models.ForeignKey(Company, related_name='rounds', blank=True, null=True)
class Investment(PolymorphicModel):
investor = models.ForeignKey(Investor, related_name='investor')
size = models.BigIntegerField(default=0)
Your description of what happens is pretty unclear. What does "when it takes those investments again" mean? Anyway, I'm guessing what you need to do is to use .prefetch_related and a Prefetch object.
from django.db.models import Prefetch
class CompanyViewSet(viewsets.ModelViewSet):
def get_queryset(self):
user = self.request.user
investor = Investor.objects.get(user=user)
companies = Company.objects.filter(
rounds__investments__investor_id=investor.id
).prefetch_related(Prefetch(
'rounds__investments',
queryset=Investment.objects.filter(
investor_id=investor.pk,
),
))
return companies
I haven't tested this snippet but it should give you a pointer in the right direction. I also optimized the investor lookup to check the id only, this will save you an unnecessary indirection.

Django fixtures primary key error, need natural keys solution

So I have a Film model that holds a list of Actors model in a many to many field:
class Person(models.Model):
full = models.TextField()
short = models.TextField()
num = models.CharField(max_length=5)
class Film(models.Model):
name = models.TextField()
year = models.SmallIntegerField(blank=True)
actors = models.ManyToManyField('Person')
I'm trying to load some initial data from json fixtures, however the problem I have is loading the many to many actors field.
For example I get the error:
DeserializationError: [u"'Anna-Varney' value must be an integer."]
with these fixtures:
{
"pk": 1,
"model": "data.Film",
"fields": {
"actors": [
"Anna-Varney"
],
"name": "Like a Corpse Standing in Desperation (2005) (V)",
"year": "2005"
}
while my actors fixture looks like this:
{
"pk": 1,
"model": "data.Person",
"fields": {
"full": "Anna-Varney",
"num": "I",
"short": "Anna-Varney"
}
}
So the many to many fields must use the pk integer, but the problem is that the data isn't sorted and for a long list of actors I don't think its practical to manually look up the pk of each one. I've been looking for solutions and it seems I have to use natural keys, but I'm not exactly sure how to apply those for my models.
EDIT: I've changed my models to be:
class PersonManager(models.Manager):
def get_by_natural_key(self, full):
return self.get(full=full)
class Person(models.Model):
objects = PersonManager()
full = models.TextField()
short = models.TextField()
num = models.CharField(max_length=5)
def natural_key(self):
return self.full
But I'm still getting the same error
There's a problem with both the input and the natural_key method.
Documentation: Serializing Django objects - natural keys states:
A natural key is a tuple of values that can be used to uniquely
identify an object instance without using the primary key value.
The Person natural_key method should return a tuple
def natural_key(self):
return (self.full,)
The serialised input should also contain tuples/lists for the natural keys.
{
"pk": 1,
"model": "data.film",
"fields": {
"actors": [
[
"Matt Damon"
],
[
"Jodie Foster"
]
],
"name": "Elysium",
"year": 2013
}
}