Some background
I am considering rebuilding an existing Laravel website with Django. It's a website that allows sharing benchmark data from drone/UAV propulsion components. Some benchmarks are done while testing multiple motors and propellers at the same time, which means a battery would be under the load of multiple motors, but it also means the airflow from one propeller has an impact on the data captured on the other propeller. This means the data is physically coupled. Here is an example. Right now I am trying to structure this project to allow upcoming features, and to see if Django ORM is a good fit.
Simplified Django models
class Benchmark(models.Model):
title = models.CharField()
private = models.BooleanField(default=False)
hide_torque = models.BooleanField(default=False)
class AbstractComponent(models.Model):
brand = models.CharField()
name = models.CharField()
class Meta:
abstract = True
class Motor(AbstractComponent):
shaft_diameter_mm = models.FloatField()
class Propeller(AbstractComponent):
diameter_in = models.FloatField()
class Battery(AbstractComponent):
capacity_kwh = models.FloatField()
class Powertrain(models.Model):
benchmark = models.ForeignKey(Benchmark, on_delete=models.CASCADE, related_name='powertrains')
motor = models.ForeignKey(Motor, on_delete=models.CASCADE)
propeller = models.ForeignKey(Propeller, on_delete=models.CASCADE, blank=True, null=True)
battery = models.ForeignKey(Battery, on_delete=models.CASCADE, blank=True, null=True)
class DerivedDataManager(models.Manager):
def get_queryset(self):
return super().get_queryset()\
.annotate(electrical_power_w=F('voltage_v') * F('current_a'))\
.annotate(mechanical_power_w=F('torque_nm') * F('rotation_speed_rad_per_s'))\
.annotate(motor_efficiency=F('mechanical_power_w') / F('electrical_power_w'))
class DataSample(models.Model):
powertrain = models.ForeignKey(Powertrain, on_delete=models.CASCADE, related_name='data')
time_s = models.FloatField()
voltage_v = models.FloatField(blank=True, null=True)
current_a = models.FloatField(blank=True, null=True)
rotation_speed_rad_per_s = models.FloatField(blank=True, null=True)
torque_nm = models.FloatField(blank=True, null=True)
thrust_n = models.FloatField(blank=True, null=True)
objects = models.Manager()
derived = DerivedDataManager()
class Meta:
constraints = [
models.UniqueConstraint(fields=['powertrain', 'time_s'], name='unique temporal sample')
]
Question
I was able to add "derived" measurements, like electrical_power_w to each row of the data, but I have no clue on how can I add derived measurements that combines the data of multiple drive trains within the same benchmark:
Assuming 3 powertrains, each with their own voltage and current data, how can I do:
Total_power = powertrain1.power + powertrain2.power + powertrain3.power
for each individual timestamp (time_s)? A total power is only meaningul if the Sum is made on simultaneously taken samples.
Goal
Without loading all the database data in Django, I would eventually want to get the 5 top benchmarks in terms of maximum total power, taking into account some business logic:
benchmarks marked as private are automatically excluded (until auth comes in)
benchmarks that opt to hide the torque automatically make the torque data, along as all the derived mechanical power and motor efficiency values go to None.
I would like to recreate this table, but with extra columns appended, like 'maximum thrust', etc... This table is paginated from within the database itself.
Hmmmm, this is quite a tricky one to navigate. I am going to start by adding the following annotation model method:
from django.db.models import Sum
I guess then it would be a case of adding:
.annotate(total_power=Sum(electrical_power_w))
But I think the issue is that each row in your DerivedDataManager queryset represents one DataSample which in turn links to one Powertrain via the ForeignKey field.
It would be better to do this in the business logic layer, grouping by the powertrain's UUID (you need to add this to your Powertrain model - see https://docs.djangoproject.com/en/3.0/ref/models/fields/#uuidfield for details of how to use this. Then, because you have grouped by, you can then apply the Sum annotation to the queryset.
So, I think you want to navigate down this path:
DataSample.objects.order_by(
'powertrain'
).aggregate(
total_price=Sum('electrical_power_w')
)
Related
I have some models in Django:
# models.py, simplified here
class Category(models.Model):
"""The category an inventory item belongs to. Examples: car, truck, airplane"""
name = models.CharField(max_length=255)
class UserInterestCategory(models.Model):
"""
How interested is a user in a given category. `interest` can be set by any method, maybe a neural network or something like that
"""
user = models.ForeignKey(User, on_delete=models.CASCADE) # user is the stock Django user
category = models.ForeignKey(Category, on_delete=models.CASCADE)
interest = models.PositiveIntegerField(default=0, validators=[MinValueValidator(0)])
class Item(models.Model):
"""This is a product that we have in stock, which we are trying to get a User to buy"""
model_number = models.CharField(max_length=40, default="New inventory item")
product_category = models.ForeignKey(Category, null=True, blank=True, on_delete=models.SET_NULL, verbose_name="Category")
I have a list view showing items, and I'm trying to sort by user_interest_category for the currently logged in user.
I have tried a couple different querysets and I'm not thrilled with them:
primary_queryset = Item.objects.all()
# this one works, and it's fast, but only finds items the users ALREADY has an interest in --
primary_queryset = primary_queryset.filter(product_category__userinterestcategory__user=self.request.user).annotate(
recommended = F('product_category__userinterestcategory__interest')
)
# this one works great but the baby jesus weeps at its slowness
# probably because we are iterating through every user, item, and userinterestcategory in the db
primary_queryset = primary_queryset.annotate(
recommended = Case(
When(product_category__userinterestcategory__user=self.request.user, then=F('product_category__userinterestcategory__interest')),
default=Value(0),
output_field=IntegerField(),
)
)
# this one works, but it's still a bit slow -- 2-3 seconds per query:
interest = Subquery(UserInterestCategory.objects.filter(category=OuterRef('product_category'), user=self.request.user).values('interest'))
primary_queryset = primary_queryset.annotate(interest)
The third method is workable, but it doesn't seem like the most efficient way to do things. Isn't there a better method than this?
I want to know the most efficient way for structuring and designing a database with several relations. I will explain my problem with a toy example which is scaled up in my current situation
Here are the Models in the Django database
1.) Employee Master (biggest table with several columns and rows)
class Emp_Mast():
emp_mast_id = models.AutoField(primary_key=True)
first_name = models.CharField(max_length=50)
middle_name = models.CharField(max_length=50, blank=True)
last_name = models.CharField(max_length=50, blank=True)
desgn_mast = models.ForeignKey("hr.Desgn_Mast", on_delete=models.SET_NULL, null=True)
qual_mast = models.ForeignKey("hr.Qualification_Mast", on_delete=models.SET_NULL, null=True)
office_mast = models.ManyToManyField("company_setup.Office_Mast", ref_mast = models.ForeignKey("hr.Reference_Mast", on_delete=models.SET_NULL, null=True)
refernce_mast = models.ForeignKey("hr.Refernce_Mast", on_delete=models.SET_NULL, null=True)
This is how the data is displayed in frontend
2.) All the relational field in the Employee Master have their corresponding models
3.) Crw_Movement_Transaction
Now I need to create a table for Transaction Data that that stores each and every movement of the employees. We have several Offshore sites that the employees need to travel to and daily about 50 rows would be added to this Transaction Table called Crw_Movement_Transaction
The Crw_Movement Table will have a few additional columns of calculations of itself and rest of the columns will be static (data would not be changed from here) and will be from the employee_master such as desgn_mast, souring_mast (so not all the fields from emp_mast either)
One way to do this is just define a Nested Relation for Emp_Mast in the serializer for Crw_Movement and optimize it using select_related and prefetch_related to reduce the queries to the database. However that is still very slow, as any number of queries to Emp_Mast are unnecessary. Would it be better design to just store the fields from Emp_Mast in Crw_Movement and update them when Emp_Mast is updated as well. If yes, what is a good way of doing that. Or should I stick to using Nested Serializer?
I'm new to Django, so I apologize a head of time if my verbiage is off. But I'll try my best!
I have two models :
PlayerProfile - this is updated once a day.
PlayerListing - this is updated every 5 minutes.
Here are simplified versions of those models.
class PlayerProfile(models.Model):
listings_id = models.CharField(max_length=120)
card_id = models.CharField(max_length=120)
first_name = models.CharField(max_length=120)
last_name = models.CharField(max_length=120)
overall = models.IntegerField()
class PlayerListing(models.Model):
listings_id = models.CharField(max_length=120, unique=True)
buy = models.IntegerField()
sell = models.IntegerField()
Currently, we just make queries based on the matching listings_id - but I'd like to have a more traditional relationship setup if possible.
How do you relate two models that have the same value for a specific field (in this case, the listings_id)?
Some potentially relevant information:
Data for both models is brought in from an external API, processed and then saved to the database.
Each PlayerListing relates to a single PlayerProfile. But not every PlayerProfile will have a PlayerListing.
When we create PlayerListings (every 5 minutes), we don't necessarily have access to the correct PlayerProfile model. listings_id's are generated last (as we have to do some extra logic to make sure they're correct).
A friend recommended that I read the book two scoops Django and I was amazed at the recommendations he makes for a robust and well-designed Django project. This reading created a doubt in me and it is where I put the business logic, I give an example. Suppose I have two models:
models.py
class Sparks(models.Model):
flavor = models.CharField(max_length=100)
quantity = models.IntegerField(default=0)
class Frozen(models.Model):
flavor = models.CharField(max_length=100)
has_cone = models.BooleanField()
quantity_sparks = models.IntegerField(default=0)
Let's suppose that every time I add a frozen, if it has sparks, I have to subtract it from the Sparks model and check that there is an available quantity. In the book they recommend putting this logic in models.py or forms.py. If create some model required modify data from another model where should I do it?
Your data model is lacking, that's the likely source of uneasiness.
class Flavor(models.Model):
name = models.CharField(max_length=100)
class Sparks(model.Model):
flavor = models.ForeignKeyField(Flavor, on_delete=models.CASCADE)
quantity = models.IntegerField(default=0)
class Frozen(model.Model):
# This maybe should be a OneToOne, can't tell from your description.
sparks = models.ForeignKeyField(Sparks)
has_cone = models.BooleanField()
Then you'd do
frozen_instance = Frozen.objects.get()
frozen.sparks.quantity # This has replaced frozen_instance.quantity_sparks
I'd like to create a filter-sort mixin for following values and models:
class Course(models.Model):
title = models.CharField(max_length=70)
description = models.TextField()
max_students = models.IntegerField()
min_students = models.IntegerField()
is_live = models.BooleanField(default=False)
is_deleted = models.BooleanField(default=False)
teacher = models.ForeignKey(User)
class Session(models.Model):
course = models.ForeignKey(Course)
title = models.CharField(max_length=50)
description = models.TextField(max_length=1000, default='')
date_from = models.DateField()
date_to = models.DateField()
time_from = models.TimeField()
time_to = models.TimeField()
class CourseSignup(models.Model):
course = models.ForeignKey(Course)
student = models.ForeignKey(User)
enrollment_date = models.DateTimeField(auto_now=True)
class TeacherRating(models.Model):
course = models.ForeignKey(Course)
teacher = models.ForeignKey(User)
rated_by = models.ForeignKey(User)
rating = models.IntegerField(default=0)
comment = models.CharField(max_length=300, default='')
A Course could be 'Discrete mathematics 1'
Session are individual classes related to a Course (e.g. 1. Introduction, 2. Chapter I, 3 Final Exam etc.) combined with a date/time
CourseSignup is the "enrollment" of a student
TeacherRating keeps track of a student's rating for a teacher (after course completion)
I'd like to implement following functions
Sort (asc, desc) by Date (earliest Session.date_from), Course.Name
Filter by: Date (earliest Session.date_from and last Session.date_to), Average TeacherRating (e.g. minimum value = 3), CourseSignups (e.g. minimum 5 users signed up)
(these options are passed via a GET parameters, e.g. sort=date_ascending&f_min_date=10.10.12&...)
How would you create a function for that?
I've tried using
denormalization (just added a field to Course for the required filter/sort criterias and updated it whenever changes happened), but I'm not very satisfied with it (e.g. needs lots of update after each TeacherRating).
ForeignKey Queries (Course.objects.filter(session__date_from=xxx)), but I might run into performance issues later on..
Thanks for any tipp!
In addition to using the Q object for advanced AND/OR queries, get familiar with reverse lookups.
When Django creates reverse lookups for foreign key relationships. In your case you can get all Sessions belonging to a Course, one of two ways, each of which can be filtered.
c = Course.objects.get(id=1)
sessions = Session.objects.filter(course__id=c.id) # First way, forward lookup.
sessions = c.session_set.all() # Second way using the reverse lookup session_set added to Course object.
You'll also want to familiarize with annotate() and aggregate(), these allow you you to calculate fields and order/filter on the results. For example, Count, Sum, Avg, Min, Max, etc.
courses_with_at_least_five_students = Course.objects.annotate(
num_students=Count('coursesignup_set__all')
).order_by(
'-num_students'
).filter(
num_students__gte=5
)
course_earliest_session_within_last_240_days_with_avg_teacher_rating_below_4 = Course.objects.annotate(
min_session_date_from = Min('session_set__all')
).annotate(
avg_teacher_rating = Avg('teacherrating_set__all')
).order_by(
'min_session_date_from',
'-avg_teacher_rating'
).filter(
min_session_date_from__gte=datetime.now() - datetime.timedelta(days=240)
avg_teacher_rating__lte=4
)
The Q is used to allow you to make logical AND and logical OR in the queries.
I recommend you take a look at complex lookups: https://docs.djangoproject.com/en/1.5/topics/db/queries/#complex-lookups-with-q-objects
The following query might not work in your case (what does the teacher model look like?), but I hope it serves as an indication of how to use the complex lookup.
from django.db.models import Q
Course.objects.filter(Q(session__date__range=(start,end)) &
Q(teacher__rating__gt=3))
Unless absolutely necessary I'd indeed steer away from denormalization.
Your sort question wasn't entirely clear to me. Would you like to display Courses, filtered by date_from, and sort it by Date, Name?