Is it "better" to have an update field or COUNT query?

Is it "better" to have an update field or COUNT query? - django

In a Django App I'm working on I've got this going on:
class Parent(models.Model):
name = models.CharField(...)
def num_children(self):
return Children.objects.filter(parent=self).count()
def avg_child_rating(self):
return Child.objects.filter(parent=self).aggregate(Avg('rating'))
class Child(models.Model):
name = models.CharField(...)
parent = models.ForeignKey(Parent)
rating = models.IntegerField(default=0)
I plan on accessing avg_child_rating often. Would it be optimizing if I did the following:
class Parent(models.Model):
...
num_children = models.IntegerField(default=0)
avg_child_rating = models.FloatField(default=0.0)
def update_parent_child_stats(sender, instance, **kwargs):
num_children = Child.objects.filter(parent=instance.parent)
if instance.parent.num_children != num_children:
instance.parent.num_children = num_children
instance.parent.avg_child_rating = Child.objects.filter(instance.parent=self).aggregate(Avg('rating'))
post_save.connect(update_parent_child_stats, sender=Child)
post_delete.connect(update_parent_child_stats, sender=Child)
The difference now is that every time a child is created/rated/deleted, the Parent object is updated. I know that the created/rating will be done often.
What's more expensive?

Depends on the scale of the problem.
If you anticipate a lot of write traffic, this might be an issue. It's much harder to scale writes than reads (replicate, caching etc.) That said, you can probably going a long way without this extra query causing you any problems.
Depending on how up-to-date your stats must be you could have some other process (non-web session) come through and update these stats nightly.

Related

Django, What is the advantage of Modifying a model manager’s initial QuerySet?

The below model have EditorManager,
class EditorManager(models.Manager):
def get_queryset(self):
return super().get_queryset().filter(role='E')
class Person(models.Model):
first_name = models.CharField(max_length=50)
role = models.CharField(max_length=1, choices=[('A', _('Author')), ('E', _('Editor'))])
people = models.Manager()
editors = EditorManager()
If I query Person.objects.filter(role='E') or Person.editors.all() I gets same result.
then, Why do we go for writing EditorManager() ?
The above code is from Django documentation (https://docs.djangoproject.com/en/3.0/topics/db/managers/).

As mentioned in the Documentation:
using multiple managers on the same model. You can attach as many Manager() instances to a model as you’d like. This is a non-repetitive way to define common “filters” for your models.
Since you just have one action, it may be hard for you to see the benefits. However, as your code gets larger, say:
good = Book.objects.filter(author="PersonA", stars=5).order_by("-date_created").exclude(outdated=True)
normal = Book.objects.filter(author="PersonA", stars=3).order_by("-date_created").exclude(outdated=True)
bad = Book.objects.filter(author="PersonA", stars=1).order_by("-date_created").exclude(outdated=True)
You can see that's an awful lot of code. With managers, you can do something like this:
class AuthorAManager(models.Manager):
def get_queryset(self):
return super().get_queryset().filter(author="PersonA").order_by("-date_created").exclude(outdated=True)
class Book(models.Model):
# ...
author_a = AuthorAManager()
good = Book.author_a.filter(stars=5)
normal = Book.author_a.filter(stars=3)
bad = Book.author_a.filter(stars=1)
Overall, it can make your code look a lot cleaner and understandable. As you said, you can't see the difference right now as you haven't gone into complex/repeating handles, but as your project expands, I'd say it's a worthwhile investment.

Django - Where should I place calculation method to design a proper and maintainable project?

I have some classes like these;
class RawMaterial(models.Model):
name = models.CharField(max_length=100)
class Product(models.Model):
name = models.CharField(max_length=100)
amount = models.IntegerField()
raw_materials = models.ManyToManyField(RawMaterial, through='MaterialProduct', related_name='products')
class MaterialProduct(models.Model):
raw_material = models.ForeignKey(RawMaterial, on_delete=models.CASCADE)
product = models.ForeignKey(Product, on_delete=models.CASCADE)
material_price = models.FloatField()
material_rate = models.FloatField()
I want to write a method which name is calculate_total_price, My method will use Product's amount and MaterialProduct's material_price , material_rate.
To design a proper/beautiful/maintainable project, where should I write my method? To models.py or views.py ?
Thanks in advance.

Following the approach fat models thin views I'd recommend you to put that calculation in the models.py.
It could look like this:
class MaterialProduct(models.Model):
# attributes
def calculate_total_price(self):
# perform calculation with
# self.product.amount
# self.material_price
# self.material_rate
return result
You can call this method also from your templates ({{ object.calculate_total_price }}) to display the total price.
Now, if you need to call this method more than once, the question is arising: why do we run the method again, if the result isn't changing?
Therefore I'd go one step further and make it a property:
class MaterialProduct(models.Model):
# attributes
#property
def total_price(self):
# perform calculation
return result
or, as mentioned before, if you don't expect the total price changing every few seconds, maybe you'd like to go with a cached_property:
from django.utils.functional import cached_property
class MaterialProduct(models.Model):
# attributes
#cached_property
def total_price(self):
# perform calculation
return result
The total price is now available as any other field in the templates ({{ object.total_price }}). If you use the cached_property the calculation is going to be performed only once and the result will be cached. Calling the property again will retrieve the result from the cache and you can save a hit to the database and CPU processing time.

Django - Displaying result information while optimizing database queries with models that multiple foreign key relationships

So I'm trying to put together a webpage and I am currently have trouble putting together a results page for each user in the web application I am putting together.
Here are what my models look like:
class Fault(models.Model):
name = models.CharField(max_length=255)
severity = models.PositiveSmallIntegerField(default=0)
description = models.CharField(max_length=1024, null=False, blank=False)
recommendation = models.CharField(max_length=1024, null=False, blank=False)
date_added = models.DateTimeField(_('date added'), default=timezone.now)
...
class FaultInstance(models.Model):
auto = models.ForeignKey(Auto)
fault = models.ForeignKey(Fault)
date_added = models.DateTimeField(_('date added'), default=timezone.now)
objects = FaultInstanceManager()
...
class Auto(models.Model):
label = models.CharField(max_length=255)
model = models.CharField(max_length=255)
make = models.CharField(max_length=255)
year = models.IntegerField(max_length=4)
user = models.ForeignKey(AUTH_USER_MODEL)
...
I don't know if my model relationships are ideal, however it made sense it my head. So each user can have multiple Auto objects associated to them. And each Auto can have multiple FaultInstance objects associated to it.
In the results page, I want to list out the all the FaultInstances that a user has across their Autos. And under each listed FaultInstance I will have a list of all the autos that the user owns that has the fault, with its information (here is kind of what I had in mind).
All FaultInstance Listing Ordered by Severity (large number to low number)
FaultInstance:
FaultDescription:
FaultRecommendation:
ListofAutosWithFault:
AutoLabel AutoModel AutoYear ...
AutoLabel AutoModel AutoYear ...
Obviously, do things the correct way would mean that I want to do as much of the list creation in the Python/Django side of things and avoid doing any logic or processing in the template. I am able to create a list per severity with the a model manager as seen here:
class FaultInstanceManager(models.Manager):
def get_faults_by_user_severity(self, user, severity):
faults = defaultdict(list)
qs_faultinst = self.model.objects.select_related().filter(
auto__user=user, fault__severity=severity
).order_by('auto__make')
for result in qs_faultinst:
faults[result.fault].append(result)
faults.default_factory = None
return faults
I still need to specify each severity but I guess if I only have 5 severity levels, I can create a list for each severity level and pass each individual one to template. Any suggestions for this is appreciated. However, thats not my problem. My stopping point right now is that I want to create a summary table at the top of their report which can give the user breakdown of fault instances per make|model|year. I can't think of the proper query or data structure to pass on to the template.
Summary (table of all the FaultInstances with the following column headers):
FaultInstance Make|Model|Year NumberOfAutosAffected
This will let me know metrics for a make or a model or a year (in the example below, its separating faults based on model). I'm listing FaultInstances because I'm only listed Faults that a connected to a user.
For Example
Bad Starter Nissan 1
Bad Tailight Honda 2
Bad Tailight Nissan 1
And I am such a perfectionist that I want to do this while optimizing database queries. If I can create a data structure in my original query that will be easily parsed in template and still get both these sections in my report (maybe a defaultdict of a defaultdict(list)), thats what I want to do. Thanks for the help and hopefully my question is thorough and makes sense.

It makes sense to use related names because it simplifies your query. Like this:
class FaultInstance(models.Model):
auto = models.ForeignKey(Auto, related_name='fault_instances')
fault = models.ForeignKey(Fault, related_name='fault_instances')
...
class Auto(models.Model):
user = models.ForeignKey(AUTH_USER_MODEL, related_name='autos')
In this case you can use:
qs_faultinst = user.fault_instances.filter(fault__severity=severity).order_by('auto__make')
instead of:
qs_faultinst = self.model.objects.select_related().filter(
auto__user=user, fault__severity=severity
).order_by('auto__make')
I can't figure out your summary table, may be you meant:
Fault Make|Model|Year NumberOfAutosAffected
In this case you can use aggregation. But It (grouping) would still be slow if you have enough data. The one easy solution is just to denormalize data by creating extra model and create few signals to update it or you can use cache.
If you have a predefined set of severities then think about this:
class Fault(models.Model):
SEVERITY_LOW = 0
SEVERITY_MIDDLE = 1
SEVERITY_HIGH = 2
...
SEVERITY_CHOICES = (
(SEVERITY_LOW, 'Low'),
(SEVERITY_MIDDLE, 'Middle'),
(SEVERITY_HIGH, 'High'),
...
)
...
severity = models.PositiveSmallIntegerField(default=SEVERITY_LOW,
choices=SEVERITY_CHOICES)
...
In your templates you can just iterate through Fault.SEVERITY_CHOICES.
Update:
Change your models:
Аllocate model into a separate model:
class AutoModel(models.Model):
name = models.CharField(max_length=255)
Change the field model of model Auto :
class Auto(models.Model):
...
auto_model = models.ForeignKey(AutoModel, related_name='cars')
...
Add a model:
class MyDenormalizedModelForReport(models.Model):
fault = models.ForeignKey(Fault, related_name='reports')
auto_model = models.ForeignKey(AutoModel, related_name='reports')
year = models.IntegerField(max_length=4)
number_of_auto_affected = models.IntegerField(default=0)
Add a signal:
def update_denormalized_model(sender, instance, created, **kwargs):
if created:
rep, dummy_created = MyDenormalizedModelForReport.objects.get_or_create(fault=instance.fault, auto_model=instance.auto.auto_model, year=instance.auto.year)
rep.number_of_auto_affected += 1
rep.save()
post_save.connect(update_denormalized_model, sender=FaultInstance)

Figuring out how to design my model and using "through"

I'm trying to figure out how to design my model. I've been going over the documentation, and it ultimately seems like I should be using the "through" attribute, but I just can't figure out how to get it to work how I want.
If someone could take a look and point out what I'm missing, that would be really helpful. I have pasted my model below.
This is what I am trying to do:
1) Have a list of server types
2) Each server type will need to have different parts available to that specific server type
3) The asset has a FK to the servermodel, which has a M2M to the parts specific to that server type.
My question is, how can each "Asset" store meta data for each "Part" specific to that "Asset"? For example, each "Asset" should have it's own last_used data for the part that's assigned to it.
Thanks! :)
class Part(models.Model):
part_description = models.CharField(max_length=30,unique=1)
last_used = models.CharField(max_length=30)
def __unicode__(self):
return self.part_description
class ServerModel(models.Model):
server_model = models.CharField(max_length=30,unique=1)
parts = models.ManyToManyField(Part)
def __unicode__(self):
return self.server_model
class Asset(models.Model):
server_model = models.ForeignKey(ServerModel)
serial_number = models.CharField(max_length=10,unique=1)
def __unicode__(self):
return self.server_model.server_model
EDIT:
Thank you for the help!
I may have not explained myself clearly, though. It's probably my confusing model names.
Example:
ServerModel stores the type of server being used, say "Dell Server 2000".
The "Dell Server 2000" should be assigned specific parts:
"RAM"
"HARD DISK"
"CDROM"
Then, I should be able to create 10x Assets with a FK to the ServerModel. Now, each of these assets should be able to mark when the "RAM" part was last used for this specific asset.

I'm not sure I exactly understand what you want to do, but basically you can solve that with a "through" model, as you expected:
import datetime
class Part(models.Model):
name = models.CharField(max_length=30,unique=1)
class ServerModel(models.Model):
server_model = models.CharField(max_length=30,unique=1)
parts = models.ManyToManyField(Part,through='Asset')
class Asset(models.Model):
server_model = models.ForeignKey(ServerModel)
part = models.ForeignKey(Part)
serial_number = models.CharField(max_length=10,unique=1)
used = models.DateTimeField(default=datetime.datetime.now())
First thing to notice is the relation of the parts to the servermodel using the "through"-model: that way for each Part instance assigned to the "parts"-property of a ServerModel instance a new Asset instance is created (Phew - hope that doesn't sound too complicated). At the time of creation the "used"-property of the Asset instance is set to the current date and time (thats what default=datetime.datetime.now() does).
If you do that, you can then just query the database for the last asset containing your part. That queryset can then be sorted by the "used" property of the Asset model, which is the date when the Asset instance has been created.
ServerModel.objects.filter(parts__name='ThePartYouAreLookingFor').order_by('asset__used')
I'm not absolutely sure if the queryset is correct, so if someone finds huge nonsense in it, feel free to edit ;)
edit:
The models above do not exactly that. But you do not even need a through model for what you want:
class ServerModel(models.Model):
server_model = models.CharField(max_length=30,unique=1)
parts = models.ManyToManyField(Part)
class Asset(models.Model):
server_model = models.ForeignKey(ServerModel)
parts = models.ForeignKey(Part)
serial_number = models.CharField(max_length=10,unique=1)
used = models.DateTimeField(default=datetime.datetime.now())
Basically you can just add assets and then query all assets that have a RAM in parts.
Asset.objects.filter(parts__contains='RAM').order_by('used')
Get the date of the first (or last) result of that queryset and you have the date of the last usage of your 'RAM'-part.

Caching of querysets and re-evaluation

I'm going to post some incomplete code to make the example simple. I'm running a recursive function to compute some metrics on a hierarchical structure.
class Category(models.Model):
parent = models.ForeignKey('self', null=True, blank=True, related_name='children', default=1)
def compute_metrics(self, shop_object, metric_queryset=None, rating_queryset=None)
if(metric_queryset == None):
metric_queryset = Metric.objects.all()
if(rating_queryset == None):
rating_queryset = Rating.objects.filter(shop_object=shop_object)
for child in self.children.all():
do stuff
child_score = child.compute_metrics(shop_object, metric_queryset, rating_queryset)
metrics_in_cat = metric_queryset.filter(category=self)
for metric in metrics_in_cat
do stuff
I hope that's enough code to see what's going on. What I'm after here is a recursive function that is only going to run those queries once each, then pass the results down. That doesn't seem to be happening right now and it's killing performance. Were this PHP/MySQL (as much as I dislike them after working with Django!) I could just run the queries once and pass them down.
From what I understand of Django's querysets, they aren't going to be evaluated in my if queryset == None then queryset=stuff part. How can I force this? Will it be re-evaluated when I do things like metric_queryset.filter(category=self)?
I don't care about data freshness. I just want to read from the DB once for each of metrics and rating, then filter on them later without hitting the DB again. It's a frustrating problem that feels like it should have a very simple answer. Pickling looks like it could work but it's not very well explained in the Django documentation.

I think the problem here is you are not evaluating the queryset until after your recursive call. If you use list() to force the evaluation of the queryset then it should only hit the database once. Note you will have to change the metrics_in_cat line to a python level filter rather than using queryset filters.
parent = models.ForeignKey('self', null=True, blank=True, related_name='children', default=1)
def compute_metrics(self, shop_object, metric_queryset=None, rating_queryset=None)
if(metric_queryset is None):
metric_queryset = list([Metric.objects.all())
if(rating_queryset is None):
rating_queryset = list(Rating.objects.filter(shop_object=shop_object))
for child in self.children.all():
# do stuff
child_score = child.compute_metrics(shop_object, metric_queryset, rating_queryset)
# metrics_in_cat = metric_queryset.filter(category=self)
metrics_in_cat = [m for m in metric_queryset if m.category==self]
for metric in metrics_in_cat
# do stuff

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Is it "better" to have an update field or COUNT query? - django

Related

Django, What is the advantage of Modifying a model manager’s initial QuerySet?

Django - Where should I place calculation method to design a proper and maintainable project?

Django - Displaying result information while optimizing database queries with models that multiple foreign key relationships

Figuring out how to design my model and using "through"

Caching of querysets and re-evaluation

Categories

Resources