Django | PostgreSQL | ManyToManyField vs ForeignKey performance - django

I am going to a Vehicle table, which has two damage types - primary and secondary (optional). Not more, only these two. And I am wondering because I expect, that the Vehicle table will have really a lot of records in near future, about 1M+ and I am not sure if I should use ManyToMany field or two ForeignKeys with disabled related_name
class Damage(models.Model):
name = models.CharField(max_length=32, unique=True)
Should I use this solution:
class Vehicle(models.Model):
damage_primary = models.ForeignKey(Damage, on_delete=models.CASCADE, related_name='+')
damage_secondary = models.ForeignKey(Damage, on_delete=models.CASCADE, blank=True, related_name='+')
OR this one:
class Vehicle(models.Model):
damage = models.ManyToManyField(Damage, on_delete=models.CASCADE)
What is the best practice, please? Because internal table, which Django will create when I will use ManyToManyField will have really a lot of records, if, in Vehicles table, there will be 1 000 000 and more records, in future.
Thank you very much for any advice!

If you have only two damage type, you want two foreign key.
The related_name does not have any database performance influence.

Related

Efficient Design of DB with several relations - Django

I want to know the most efficient way for structuring and designing a database with several relations. I will explain my problem with a toy example which is scaled up in my current situation
Here are the Models in the Django database
1.) Employee Master (biggest table with several columns and rows)
class Emp_Mast():
emp_mast_id = models.AutoField(primary_key=True)
first_name = models.CharField(max_length=50)
middle_name = models.CharField(max_length=50, blank=True)
last_name = models.CharField(max_length=50, blank=True)
desgn_mast = models.ForeignKey("hr.Desgn_Mast", on_delete=models.SET_NULL, null=True)
qual_mast = models.ForeignKey("hr.Qualification_Mast", on_delete=models.SET_NULL, null=True)
office_mast = models.ManyToManyField("company_setup.Office_Mast", ref_mast = models.ForeignKey("hr.Reference_Mast", on_delete=models.SET_NULL, null=True)
refernce_mast = models.ForeignKey("hr.Refernce_Mast", on_delete=models.SET_NULL, null=True)
This is how the data is displayed in frontend
2.) All the relational field in the Employee Master have their corresponding models
3.) Crw_Movement_Transaction
Now I need to create a table for Transaction Data that that stores each and every movement of the employees. We have several Offshore sites that the employees need to travel to and daily about 50 rows would be added to this Transaction Table called Crw_Movement_Transaction
The Crw_Movement Table will have a few additional columns of calculations of itself and rest of the columns will be static (data would not be changed from here) and will be from the employee_master such as desgn_mast, souring_mast (so not all the fields from emp_mast either)
One way to do this is just define a Nested Relation for Emp_Mast in the serializer for Crw_Movement and optimize it using select_related and prefetch_related to reduce the queries to the database. However that is still very slow, as any number of queries to Emp_Mast are unnecessary. Would it be better design to just store the fields from Emp_Mast in Crw_Movement and update them when Emp_Mast is updated as well. If yes, what is a good way of doing that. Or should I stick to using Nested Serializer?

Many foreign keys in one lookup table. Bad idea?

I am using Django, and my tables look like
class Product(models.Model):
category = models.CharField(max_length=50)
title = models.CharField(max_length=200)
class Value(models.Model):
name = models.CharField(max_length=200, unique=True)
class Attribute(models.Model):
name = models.CharField(max_length=200)
parent = models.ForeignKey('self', related_name='children')
values = models.ManyToManyField(Value, through='ProductAttributeRelationship', related_name='values')
class Meta:
unique_together = ('name', 'parent')
class ProductAttributeRelationship(models.Model):
product = models.ForeignKey(Product, related_name='products')
value = models.ForeignKey(Value, related_name='values')
attribute = models.ForeignKey(Attribute, related_name='attributes')
class Meta:
unique_together = ('product', 'value', 'attribute', 'price')
class Price(models.Model):
regular = models.IntegerField(blank=True, null=True)
sale = models.IntegerField(blank=True, null=True)
on_sale = models.NullBooleanField(blank=True)
created = models.DateTimeField(auto_now=True)
relation = models.ForeignKey(ProductAttributeRelationship)
class Meta:
unique_together = ('regular', 'sale', 'on_sale', 'sale_percentage')
Is it a bad idea to have the 3 ForeignKeys in ProductAttributeRelationship and the ForeignKey to that in Price since a ProductAttributeRelationship may have many prices? I don't have much knowledge in this area, and have been reading up about the 5 normalized forms, but am not sure where I should, or could, fit into the recommended 3rd form.
We declare a foreign key when a value for a subrow in one table has to appear as a value of a subrow in another table. That's what you have, so declare them.
Foreign keys have nothing to do with normalization per se. A normal form is something that a table is or isn't in. Normalization is about replacing a table by multiple tables that always join to it. A foreign key constraint holds when two tables have to agree per above. It can happen that new foreign keys holds between new tables from normalizing but if so you would just declare them. They don't affect what normal forms a table is in or normalization.
(Although ProductAttributeRelationship product, value, attribute and relationship are unique, presumably it is because product and price are unique, and product has just one price and an attribute has just one value. So you should say that product and price are unique; then all four have to be. Similarly, although Price regular, sale, on_sale and sale_percentage are unique, if regular, sale and on_sale are unique with sale_percentage a function of them then you should declare the three unique.)
(PS: 1. The main issue is integrity: If there is no constraint on the subset then invalid updates are allowed. 2. If the subset is unique then the superset is unique. So if the DBMS is enforcing subset uniqueness then it is enforcing superset uniqueness. 3. Moreover every superset of a CK is unique so there's nothing special about the particular extra columns you chose. 4. SQL DBMS UNIQUE/PK usually come with an index taking space and time to manage. For integrity and basic efficiency/optimization that's wasted on non-CK columns. But there can always be other special-case reasons for indexing. 5a. One reason to declare a non-CK superkey is that SQL forces you to do so to use it as a FK target. (You can either consider this redundancy as a helpful check or a tedious obtuseness.) 5b. Another reason is that sometimes this allows declarative (vs procedural/triggered) expression of integrity constraints via FK checking.)

Django: sub-fields within a model field

In my primary class model Deals, I have certain fields as description, price, date_created etc. I now have to add some fields having sub-fields to it. For eg, I'm trying to add an age field to Deals. This age field further has subfields (like score_for_kid, score_for_baby, score_for_old etc), and I want to edit these scores from the admin.
Here is my models.py:
class Deals(models.Model):
description = models.TextField()
price = models.DecimalField(max_digits=7, decimal_places=2)
url = models.URLField(verify_exists=False)
currency = models.CharField(max_length=3)
created_date = models.DateField(auto_now_add=True)
kid_score = models.IntegerField(max_length=2,default=0)
teenager_score = models.IntegerField(max_length=2,default=0)
youth_score = models.IntegerField(max_length=2,default=0)
old_score = models.IntegerField(max_length=2,default=0)
I don't want to store all these sub fields (around 20-25 in 4 different fields) in the model, instead an age field connected to these subfields. Would a ManyToManyField work for this?
The underlying requirement is that when a user selects a subfield (say kids) on the browser, all the objects having higher kid scores are displayed.
I'm very new to Django and any help on this would be great. Thanks.
If I understand your question properly ou need to use ForeignKey fields.
class Deals(models.Model):
description = models.TextField()
price = models.DecimalField(max_digits=7, decimal_places=2)
#...
age = models.ForeignKey(Age)
class Age(models.Model):
kid_score = models.IntegerField(max_length=2,default=0)
teenager_score = models.IntegerField(max_length=2,default=0)
#...
Have a good read of the docs on Models. You might also find it useful to do some reading on relational databases / basic sql.
When you come to edit your objects in the django admin, you'll probably want to use an InlineModelAdmin class.
UPDATE
re-reading your question, it sounds like you might simply want to show / hide these additional fields on the main Deal model. If this is the case then you want to use fieldsets in the admin, with a class 'collapse'. There's an example in the docs.
If you want each Deal record to have multiple kid_score's associated with it then you want a foreign key. If each Deal can only have one kid_score then you need to keep the kid_score (and other) fields in the main model (if this is confusing then definitely do some reading on sql / relational databases).

Django many to many recursive relationship

I'm not so great with databases so sorry if I don't describe this very well...
I have an existing Oracle database which describes an algorithim catalogue.
There are two tables algorithims and xref_alg.
Algorithims can have parents and children algorithms. Alg_Xref contains these relationships with two foreign keys - xref_alg and xref_parent.
These are the Django models I have so far from the inspectdb command
class Algorithms(models.Model):
alg_id = models.AutoField(primary_key=True)
alg_name = models.CharField(max_length=100, blank=True)
alg_description = models.CharField(max_length=1000, blank=True)
alg_tags = models.CharField(max_length=100, blank=True)
alg_status = models.CharField(max_length=1, blank=True)
...
class Meta:
db_table = u'algorithms'
class AlgXref(models.Model):
xref_alg = models.ForeignKey(Algorithms, related_name='algxref_alg' ,null=True, blank=True)
xref_parent = models.ForeignKey(Algorithms, related_name='algxref_parent', null=True, blank=True)
class Meta:
db_table = u'alg_xref'
On trying to query AlgXref I encounter this:
DatabaseError: ORA-00904: "ALG_XREF"."ID": invalid identifier
So the error seems to be that it looks for a primary key ID which isn't in the table.. I could create one but seems a bit pointless. Is there anyway to get around this? Or change my models?
EDIT: So after a bit of searching it seems that Django requires a model to have a primary key. Life is too short so have just added a primary key. Will this have any impact on performance?
This is currently a limitation of the ORM provided by Django. Each model has to have one field marked as primary_key=True, if there isn't one, the framework automatically creates an AutoField with name id.
However, this is being worked on as we speak as part of this year's Google Summer of Code and hopefully will be in Django by the end of this year. For now you can try to use the fork of Django available at https://github.com/koniiiik/django which contains an implementation (which is not yet complete but should be sufficient for your purposes).
As for whether there is any benefit or not, that depends. It certainly makes the database more reusable and causes less headaches if you just add an auto incrementing id column to each table. The performance impact shouldn't be too high, the only thing you might notice is that if you have a many-to-many table like this, containing only two ForeignKey columns, adding a third one will increase its size by one half. That should, however, be irrelevant as long as you don't store billions of rows in that table.

Is it ok to use a dictionary on the database for an app with lots of records?

I'm am working on an app which will handle lots of information and am looking for the best way of creating my models. Since I have never worked with apps that deal with so many records, database optimization is not a topic I know lots of, but it seems to me that a good design is a good place to start.
Right now, I have a table for customers, a table for products and a table for product-customer (since we assign a code for each product a customer buys). Since I want to track the balances, there is also a balance table. My models look like this at the moment:
class Customer(models.Model):
first_name = models.CharField(max_length=35)
last_name = models.CharField(max_length=35)
customer_ID= models.IntegerField(primary_key=True)
phone = models.CharField(max_length=10, blank=True, null=True)
class Product(models.Model):
product_ID = models.IntegerField(primary_key=True)
product_code = models.CharField(max_length=25)
invoice_date = models.DateField()
employee = models.ForeignKey(Employee, null=True, blank=True)
product_active = models.BooleanField()
class ProductCustomer(models.Model):
prod = models.ForeignKey(Product, db_index=True)
cust = models.ForeignKey(Customer, db_index=True)
product_customer_ID = models.IntegerField(primary_key=True)
[...]
class Balance(models.Model):
product_customer = models.ForeignKey(ProductCustomer, db_index=True)
balance = models.DecimalField(max_digits=10, decimal_places=2)
batch = models.ForeignKey(Batch)
[...]
The app will return the 'history' of the customer. If the pax was overdue at some point and then he paid and then was due for a refund, etc.
I was thinking if I should insert a CharField on the Pax table which would hold a dictionary with date:status (the status could be calculated and added to the dictionary when I upload the information) or if it is more efficient to do a query on the Balance table, or if there is a better solution to be implemented.
Since there are thousands of products and even more customers, we are talking about around 400K records for the balances on a weekly basis... I am concerned about what can be done to ensure the app runs smoothly.
If I understand your question you seem to be asking about whether the join conditions will impose an unreasonable burden on your lookup query. To some extent this depends on your rdbms. My recommendation is that you go with PostgreSQL over MySQL because MySQL's innodb tables are heavily optimized for primary key lookups and this means two btrees have to be traversed in order to find the records on a join. PostgreSQL on the other hand allows for physical scans of tables meaning foreign key lookups are a bit faster usually.
In general yes, the dictionary approach is fine for an app with lots of records. The questions typically come out of how you are querying and how many records you are pulling in a given query. That is a much larger factor than how many records are stored, at least for a db like PostgreSQL.