Querying an object with pk vs custom object_id - django

I have a model as shown below,
class Person(models.Model):
first_name = models.CharField(max_length=30)
middle_name = models.CharField(max_length=30, blank=True)
last_name = models.CharField(max_length=30)
person_id = models.CharField(max_length=32, blank=True)
where the person_id is populated on save, which is a random hex string generated by uuid, which would look something like 'E4DC6C20BECA49E6817DB2365924B1EF'
so my question is, in a database of a large magnitude of objects, does the queries
Person.objects.get(pk=10024)
(pk) vs (person_id)
Person.objects.get(person_id='E4DC6C20BECA49E6817DB2365924B1EF')
does any of the method has a performance advantage in a large scale of data?
I am not much aware of the database internals.
My database is postgresql

To get good performance from querying a column in a database, it needs to be indexed. The primary key column is indexed automatically (by definition), but your person_id one won't be; you should add db_index=True to the declaration, then make and run migrations.

Related

Efficient Design of DB with several relations - Django

I want to know the most efficient way for structuring and designing a database with several relations. I will explain my problem with a toy example which is scaled up in my current situation
Here are the Models in the Django database
1.) Employee Master (biggest table with several columns and rows)
class Emp_Mast():
emp_mast_id = models.AutoField(primary_key=True)
first_name = models.CharField(max_length=50)
middle_name = models.CharField(max_length=50, blank=True)
last_name = models.CharField(max_length=50, blank=True)
desgn_mast = models.ForeignKey("hr.Desgn_Mast", on_delete=models.SET_NULL, null=True)
qual_mast = models.ForeignKey("hr.Qualification_Mast", on_delete=models.SET_NULL, null=True)
office_mast = models.ManyToManyField("company_setup.Office_Mast", ref_mast = models.ForeignKey("hr.Reference_Mast", on_delete=models.SET_NULL, null=True)
refernce_mast = models.ForeignKey("hr.Refernce_Mast", on_delete=models.SET_NULL, null=True)
This is how the data is displayed in frontend
2.) All the relational field in the Employee Master have their corresponding models
3.) Crw_Movement_Transaction
Now I need to create a table for Transaction Data that that stores each and every movement of the employees. We have several Offshore sites that the employees need to travel to and daily about 50 rows would be added to this Transaction Table called Crw_Movement_Transaction
The Crw_Movement Table will have a few additional columns of calculations of itself and rest of the columns will be static (data would not be changed from here) and will be from the employee_master such as desgn_mast, souring_mast (so not all the fields from emp_mast either)
One way to do this is just define a Nested Relation for Emp_Mast in the serializer for Crw_Movement and optimize it using select_related and prefetch_related to reduce the queries to the database. However that is still very slow, as any number of queries to Emp_Mast are unnecessary. Would it be better design to just store the fields from Emp_Mast in Crw_Movement and update them when Emp_Mast is updated as well. If yes, what is a good way of doing that. Or should I stick to using Nested Serializer?

Many foreign keys in one lookup table. Bad idea?

I am using Django, and my tables look like
class Product(models.Model):
category = models.CharField(max_length=50)
title = models.CharField(max_length=200)
class Value(models.Model):
name = models.CharField(max_length=200, unique=True)
class Attribute(models.Model):
name = models.CharField(max_length=200)
parent = models.ForeignKey('self', related_name='children')
values = models.ManyToManyField(Value, through='ProductAttributeRelationship', related_name='values')
class Meta:
unique_together = ('name', 'parent')
class ProductAttributeRelationship(models.Model):
product = models.ForeignKey(Product, related_name='products')
value = models.ForeignKey(Value, related_name='values')
attribute = models.ForeignKey(Attribute, related_name='attributes')
class Meta:
unique_together = ('product', 'value', 'attribute', 'price')
class Price(models.Model):
regular = models.IntegerField(blank=True, null=True)
sale = models.IntegerField(blank=True, null=True)
on_sale = models.NullBooleanField(blank=True)
created = models.DateTimeField(auto_now=True)
relation = models.ForeignKey(ProductAttributeRelationship)
class Meta:
unique_together = ('regular', 'sale', 'on_sale', 'sale_percentage')
Is it a bad idea to have the 3 ForeignKeys in ProductAttributeRelationship and the ForeignKey to that in Price since a ProductAttributeRelationship may have many prices? I don't have much knowledge in this area, and have been reading up about the 5 normalized forms, but am not sure where I should, or could, fit into the recommended 3rd form.
We declare a foreign key when a value for a subrow in one table has to appear as a value of a subrow in another table. That's what you have, so declare them.
Foreign keys have nothing to do with normalization per se. A normal form is something that a table is or isn't in. Normalization is about replacing a table by multiple tables that always join to it. A foreign key constraint holds when two tables have to agree per above. It can happen that new foreign keys holds between new tables from normalizing but if so you would just declare them. They don't affect what normal forms a table is in or normalization.
(Although ProductAttributeRelationship product, value, attribute and relationship are unique, presumably it is because product and price are unique, and product has just one price and an attribute has just one value. So you should say that product and price are unique; then all four have to be. Similarly, although Price regular, sale, on_sale and sale_percentage are unique, if regular, sale and on_sale are unique with sale_percentage a function of them then you should declare the three unique.)
(PS: 1. The main issue is integrity: If there is no constraint on the subset then invalid updates are allowed. 2. If the subset is unique then the superset is unique. So if the DBMS is enforcing subset uniqueness then it is enforcing superset uniqueness. 3. Moreover every superset of a CK is unique so there's nothing special about the particular extra columns you chose. 4. SQL DBMS UNIQUE/PK usually come with an index taking space and time to manage. For integrity and basic efficiency/optimization that's wasted on non-CK columns. But there can always be other special-case reasons for indexing. 5a. One reason to declare a non-CK superkey is that SQL forces you to do so to use it as a FK target. (You can either consider this redundancy as a helpful check or a tedious obtuseness.) 5b. Another reason is that sometimes this allows declarative (vs procedural/triggered) expression of integrity constraints via FK checking.)

How to create classes in Django model

Since Django is mapping each model to a table. I am not able to create packages in my code. Where I can wrap sections of the class to make it more logical and increase the coupling.
For example
class Employee(models.Model):
#personal information
first_name = models.CharField(max_length=20)
middle_initial = models.CharField(max_length=1)
last_name = models.CharField(max_length=20)
dob = models.DateField()
#job contract information
full_time = models.BooleanField()
hours_per_day = models.PositiveSmallIntegerField()
#.. and more
What I am trying to do is this
employee.job.is_full_time
employee.job.get_hours_per_day
# and more
However this means that I have to create a new model and connect it with the employee model using OneToOneField. I don't want to do that .. joins in the database are expensive. Is there anyway to create something like this ?
class Employee(models.Model):
class PersonalInformation:
first_name = models.CharField(max_length=20)
middle_initial = models.CharField(max_length=1)
last_name = models.CharField(max_length=20)
dob = models.DateField()
class Job:
full_time = models.BooleanField()
hours_per_day = models.PositiveSmallIntegerField()
#.. and more
The main key to the answer is to create a model that contain multiple classes which are going to be mapped to one table in the database.
There is no way to do that with just Django. Django models represent flat, relational databases. Extending the model framework to provide functionality beyond what its backend is capable of is unnecessary, and therefore not implemented in the default framework.
There are third-party packages that provide what you want, but as far as I know these packages use database backends that are capable of using such data structures.
Personally I would go with your first example, a flat, single model that represents all my Employee data. Prevent any disambiguity in your field names, and there will be no cost for using a flat model over a nested model.
And remember: premature optimization is a lot more expensive than an join statement.

Is it ok to use a dictionary on the database for an app with lots of records?

I'm am working on an app which will handle lots of information and am looking for the best way of creating my models. Since I have never worked with apps that deal with so many records, database optimization is not a topic I know lots of, but it seems to me that a good design is a good place to start.
Right now, I have a table for customers, a table for products and a table for product-customer (since we assign a code for each product a customer buys). Since I want to track the balances, there is also a balance table. My models look like this at the moment:
class Customer(models.Model):
first_name = models.CharField(max_length=35)
last_name = models.CharField(max_length=35)
customer_ID= models.IntegerField(primary_key=True)
phone = models.CharField(max_length=10, blank=True, null=True)
class Product(models.Model):
product_ID = models.IntegerField(primary_key=True)
product_code = models.CharField(max_length=25)
invoice_date = models.DateField()
employee = models.ForeignKey(Employee, null=True, blank=True)
product_active = models.BooleanField()
class ProductCustomer(models.Model):
prod = models.ForeignKey(Product, db_index=True)
cust = models.ForeignKey(Customer, db_index=True)
product_customer_ID = models.IntegerField(primary_key=True)
[...]
class Balance(models.Model):
product_customer = models.ForeignKey(ProductCustomer, db_index=True)
balance = models.DecimalField(max_digits=10, decimal_places=2)
batch = models.ForeignKey(Batch)
[...]
The app will return the 'history' of the customer. If the pax was overdue at some point and then he paid and then was due for a refund, etc.
I was thinking if I should insert a CharField on the Pax table which would hold a dictionary with date:status (the status could be calculated and added to the dictionary when I upload the information) or if it is more efficient to do a query on the Balance table, or if there is a better solution to be implemented.
Since there are thousands of products and even more customers, we are talking about around 400K records for the balances on a weekly basis... I am concerned about what can be done to ensure the app runs smoothly.
If I understand your question you seem to be asking about whether the join conditions will impose an unreasonable burden on your lookup query. To some extent this depends on your rdbms. My recommendation is that you go with PostgreSQL over MySQL because MySQL's innodb tables are heavily optimized for primary key lookups and this means two btrees have to be traversed in order to find the records on a join. PostgreSQL on the other hand allows for physical scans of tables meaning foreign key lookups are a bit faster usually.
In general yes, the dictionary approach is fine for an app with lots of records. The questions typically come out of how you are querying and how many records you are pulling in a given query. That is a much larger factor than how many records are stored, at least for a db like PostgreSQL.

Querying across database tables

I've got Django tables like the following (I've removed non-essential fields):
class Person(models.Model):
nameidx = models.IntegerField(primary_key=True)
name = models.CharField(max_length=300, verbose_name="Name")
class Owner(models.Model):
id = models.IntegerField(primary_key=True)
nameidx = models.IntegerField(null=True, blank=True) # is Person.nameidx
structidx = models.IntegerField() # is PlaceRef.structidx
class PlaceRef(models.Model):
id = models.IntegerField(primary_key=True)
structidx = models.IntegerField() # used for many things and not equivalent to placeidx
placeidx = models.IntegerField(null=True, blank=True) # is Place.placeidx
class Place(models.Model):
placeidx = models.IntegerField(primary_key=True)
county = models.CharField(max_length=36, null=True, blank=True)
name = models.CharField(max_length=300)
My question is as follows. If in my views.py file, I have a Person referenced by name and I want to find out all the Places they own as a QuerySet, what should I do?
I can get this far:
person = Person.objects.get(name=name)
owned_relations = Owner.objects.filter(nameidx=nameidx)
How do I get from here to Place? Should I use database methods?
I'm also not sure if I should be using ForeignKey for e.g. Owner.nameidx.
Thanks and apologies for this extremely basic question. I'm not sure how to learn the basics of database queries except by trying, failing, asking SO, trying again... :)
The whole point of foreign keys is for uses like yours. If you already know that Owner.nameidx refers to a Person, why not make it a foreign key (or a OneToOne field) to the Person table? Not only do you get the advantage of referential integrity - it makes it impossible to enter a value for nameidx that isn't a valid Person - the Django ORM will give you the ability to 'follow' the relationships easily:
owned_places = Place.objects.filter(placeref__owner__person=my_person)
will give you all the places owned by my_person.
Incidentally, you don't need to define the separate primary key fields - Django will do it for you, and make them autoincrement fields, which is almost always what you want.
If u could redesign.Then
In owner nameidx can be a foreign key to Person(nameidx)
Placeref(structidx) could be a foreign key to Owner(structidx) and
Place(placeidx) could be a foreign key Place ref(placeidx)
Then u could deduce the place value easily..