How to transform a prefetch_related query into dataframe using panda - django

I would simply like to transform a prefetch_related query into a Panda dataframe with all the information from the two models below. This should be very simple but somehow nothing works. I get a 'Capture_set is not defined' with the code below.
Any idea ?
class Capture(models.Model):
species_name = models.CharField(max_length=50)
total_capture = models.IntegerField()
class Species(models.Model):
species_name = models.ForeignKey(Capture, on_delete=models.DO_NOTHING)
length = models.IntegerField()
weight = models.IntegerField()
data = pd.DataFrame(list(Species.objects.all().prefetch_related(Capture_set)))

I know is not the way you were looking for, but you can achieve this with values(). Let's suppose your related name is "captures"
import pandas
query = Species.objects.all().values('captures__pk', 'captures__species_name',
'captures__total_capture', 'pk', 'length',
'weight')
data = pandas.DataFrame.from_records(query)

Related

how to select fields in related tables quickly in django models

I'm trying to get all values in current table, and also get some fields in related tables.
class school(models.Model):
school_name = models.CharField(max_length=256)
school_type = models.CharField(max_length=128)
school_address = models.CharField(max_length=256)
class hometown(models.Model):
hometown_name = models.CharField(max_length=32)
class person(models.Model):
person_name = models.CharField(max_length=128)
person_id = models.CharField(max_length=128)
person_school = models.ForeignKey(school, on_delete=models.CASCADE)
person_ht = models.ForeignKey(hometown, on_delete=models.CASCADE)
how to quick select all info i needed into a dict for rendering.
there will be many records in person, i got school_id input, and want to get all person in this school, and also want these person's hometown_name shown.
i tried like this, can get the info i wanted. And any other quick way to do it?
m=person.objects.filter(person_school_id=1)
.values('id', 'person_name', 'person_id',
school_name=F('person_school__school_name'),
school_address=F('person_school__school_address'),
hometown_name=F('person_ht__hometown_name'))
person_name, person_id, school_name, school_address, hometown_name
if the person have many fields, it will be a hard work for list all values.
what i mean, is there any queryset can join related tables' fields together, which no need to list fields in values.
Maybe like this:
m=person.objects.filter(person_school_id=1).XXXX.values()
it can show all values in school, and all values in hometown together with person's values in m, and i can
for x in m:
print(x.school_name, x.hometown_name, x.person_name)
You add a prefetch_related query on top of your queryset.
prefetch_data = Prefetch('person_set, hometown_set, school_set', queryset=m)
Where prefetch_data will prepare your DB to fetch related tables and m is your original filtered query (so add this below your Person.objects.filter(... )
Then you do the actual query to the DB:
query = query.prefetch_related(prefetch_data)
Where query will be the actual resulting query with a list of Person objects (so add that line below the prefetch_data one).
Example:
m=person.objects.filter(person_school_id=1)
.values('id', 'person_name', 'person_id',
school_name=F('person_school__school_name'),
school_address=F('person_school__school_address'),
hometown_name=F('person_ht__hometown_name'))
prefetch_data = Prefetch('person_set, hometown_set, school_set', queryset=m)
query = query.prefetch_related(prefetch_data)
In that example I've broken down the queries into more manageable pieces, but you can do the whole thing in one big line too (less manageable to read though):
m=person.objects.filter(person_school_id=1)
.values('id', 'person_name', 'person_id',
school_name=F('person_school__school_name'),
school_address=F('person_school__school_address'),
hometown_name=F('person_ht__hometown_name')).prefetch_related('person, hometown, school')

How do I construct an order_by for a specific record in a ManyToOne field?

I'm trying to sort (order) by statistical data stored in a ManyToOne relationship. Suppose I have the following code:
class Product(models.Model):
info = ...
data = models.IntegerField(default=0.0)
class Customer(models.Model):
info = ...
purchases = models.ManyToManyField(Product, related_name='customers', blank=True)
class ProductStats(models.Model):
ALL = 0
YOUNG = 1
OLD = 2
TYPE = ((ALL, 'All'), (YOUNG, 'Young'), (OLD, 'Old'),)
stats_type = models.SmallIntegerField(choices=TYPE)
product = models.ForeignKey(Product, related_name='stats', on_delete=models.CASCADE)
data = models.FloatField(default=0.0)
Then I would like to sort the products by their stats for the ALL demographic (assume every product has a stats connected to it for ALL). This might look something like the following:
products = Product.objects.all().order_by('stats__data for stats__stats_type=0')
Currently the only solution I can think of is either to create a new stats class just for all and use a OneToOneField for Product. Or, add a OneToOneField for Product pointing to the ALL stats in ProductStats.
Thank you for your help.
How about like this using multiple fields in order_by:
Product.objects.all().order_by('stats__data', 'stats__stats_type')
# it will order products from stats 0, then 1 then 2
Or if you want to get data for only stats_type 0:
Product.objects.filter(stats__stats_type=0).order_by('stats__data')
You can annotate the value of the relevant demographic and order by that:
from django.db.models import F
Product.objects.all().filter(stats__stats_type=0).annotate(data_for_all=F('stats__data').order_by('data_for_all')

Django: Annotate based on an annotation

Let's say I'm using Django to manage a database about athletes:
class Player(models.Model):
name = models.CharField()
weight = models.DecimalField()
team = models.ForeignKey('Team')
class Team(models.Model):
name = models.CharField()
sport = models.ForeignKey('Sport')
class Sport(models.Model):
name = models.CharField()
Let's say I wanted to compute the average weight of the players on each team. I think I'd do:
Team.objects.annotate(avg_weight=Avg(player__weight))
But now say that I want to compute the variance of team weights within each sport. Is there a way to do that using the Django ORM? How about using the extra() method on a QuerySet? Any advice is much appreciated.
you can use query like this :
class SumSubquery(Subquery):
template = "(SELECT SUM(`%(field)s`) From (%(subquery)s _sum))"
output_field = models.Floatfield()
def as_sql(self, compiler, connection, template=None, **extra_context):
connection.ops.check_expression_support(self)
template_params = {**self.extra, **extra_context}
template_params['subquery'], sql_params = self.queryset.query.get_compiler(connection=connection).as_sql()
template_params["field"] = list(self.queryset.query.annontation_select_mask)[0]
sql = template % template_params
return sql, sql_params
Team.objects.all().values("sport__name").annotate(variance=SumSubquery(Player.objects.filter(team__sport_id=OuterRef("sport_id")).annotate(sum_pow=ExpressionWrapper((Avg("team__players__weight") - F("weight"))**2,output_field=models.Floatfield())).values("sum_pow"))/(Count("players", output_field=models.FloatField())-1))
and add related name to model like this:
class Player(models.Model):
name = models.CharField()
weight = models.DecimalField()
team = models.ForeignKey('Team', related_name="players")
I'm going to assume (perhaps incorrectly) that you mean by 'variance' the difference between maximum and minimum weights. If so, you can generate more than one aggregate with a single query, like so:
from django.db.models import Avg, Max, Min
Team.objects.aggregate(Avg('player__weight'), Max('player__weight'), Min('player__weight'))
This is taken from the django docs on generating aggregation over a queryset.

order_by intermediate table for given relation in SQL with Django ORM

class Nutrient(models.Model):
tagname = models.CharField(max_length=10)
class FoodNutrientAmount(models.Model):
nutrient = models.ForeignKey(Nutrient)
food = models.ForeignKey(Food)
amount = models.FloatField()
class Food(models.Model):
nutrients = models.ManyToManyField(
Nutrient,
through=FoodNutrientAmount,
)
So, I can get the Foods ordered by the amount of tagname=FOL Nutrient with a list comprehension:
ordered_fnas = FoodNutrientAmount.objects.filter(
nutrient__tagname="FOL"
).order_by('-amount')
ordered_foods_by_most_fol = [fna.food for fna in ordered_fnas]
Can I get such an iterable as a queryset without taking the whole thing into memory?
Maybe there is a different approach using Food.objects.annotate or extra? I can't think of a great way to do it at the moment.
I can get close with values_list; but, I get the ordered list of pks and not the queryset of Food objects that I want.
FoodNutrientAmount.objects.filter(
nutrient__tagname='FOL'
).order_by('-amount').v‌​alues_list('food', flat=True)
Edit:
This is a Many-to-many relationship. So you can probably leverage that. How about adding default ordering to FoodNutrientAmount and then you can just do normal manytomany queries.
class FoodNutrientAmount(models.Model):
nutrient = models.ForeignKey(Nutrient)
food = models.ForeignKey(Food)
amount = models.FloatField()
class Meta:
ordering = ('-amount',)
Then you can just call -
nutritious_foods = Food.objects.filter(nutrients__tagname='FOL').order_by('foodnutrientamount')

Join Multiple Querysets From Different Base Models Django

I currently have two different models.
class Journal(models.Model):
date = models.DateField()
from_account = models.ForeignKey(Account,related_name='transferred_from')
to_account = models.ForeignKey(Account,related_name='transferred_to')
amount = models.DecimalField(max_digits=8, decimal_places=2)
memo = models.CharField(max_length=100,null=True,blank=True)
class Ledger(models.Model):
date = models.DateField()
bank_account = models.ForeignKey(EquityAccount,related_name='paid_from')
account = models.ForeignKey(Account)
amount = models.DecimalField(max_digits=8, decimal_places=2)
name = models.ForeignKey(Party)
memo = models.CharField(max_length=100,null=True,blank=True)
I am creating a report in a view and get the following error:
Merging 'ValuesQuerySet' classes must involve the same values in each case.
What I'm trying to do is only pull out the fields that are common so I can concatenate both of them e.g.
def report(request):
ledger = GeneralLedger.objects.values('account').annotate(total=Sum('amount'))
journal = Journal.objects.values('from_account').annotate(total=Sum('amount'))
report = ledger & journal
...
If I try to make them exactly the same to test e.g.
def report(request):
ledger = GeneralLedger.objects.values('memo').annotate(total=Sum('amount'))
journal = Journal.objects.values('memo').annotate(total=Sum('amount'))
report = ledger & journal
...
I get this error:
Cannot combine queries on two different base models.
Anyone know how this can be accomplished?
from itertools import chain
report = chain(ledger, journal)
Itertools for the win!
If you want to do an Union, you should convert these querysets into python set objects.
If it is possible to filter the queryset itself rightly, you should really do that!
Use itertools.chain:
from itertools import chain
report = list(chain(ledger, journal))
Note: you need to turn the resulting object into a list for Django to be able to process it.
I had the same issue. I solved it using the union method combined_queryset = qs1.union(qs2)
Using your example: report = ledger.union(journal)