Do Django models really need a single unique key field - django

Some of my models are only unique in a combination of keys. I don't want to use an auto-numbering id as the identifier as subsets of the data will be exported to other systems (such as spreadsheets), modified and then used to update the master database.
Here's an example:
class Statement(models.Model):
supplier = models.ForeignKey(Supplier)
total = models.DecimalField("statement total", max_digits=10, decimal_places=2)
statement_date = models.DateField("statement date")
....
class Invoice(models.Model):
supplier = models.ForeignKey(Supplier)
amount = models.DecimalField("invoice total", max_digits=10, decimal_places=2)
invoice_date = models.DateField("date of invoice")
statement = models.ForeignKey(Statement, blank=True, null=True)
....
Invoice records are only unique for a combination of supplier, amount and invoice_date
I'm wondering if I should create a slug for Invoice based on supplier, amount and invoice_date so that it is easy to identify the correct record.
An example of the problem of having multiple related fields to identify the right record is django-csvimport which assumes there is only one related field and will not discriminate on two when building the foreign key links.
Yet the slug seems a clumsy option and needs some kind of management to rebuild the slugs after adding records in bulk.
I'm thinking this must be a common problem and maybe there's a best practice design pattern out there somewhere.
I am using PostgreSQL in case anyone has a database solution. Although I'd prefer to avoid that if possible, I can see that it might be the way to build my slug if that's the way to go, perhaps with trigger functions. That just feels a bit like hidden functionality though, and may cause a headache for setting up on a different server.
UPDATE - after reading initial replies
My application requires that data may be exported, modified remotely, and merged back into the master database after review and approval. Hidden autonumber keys don't easily survive that consistently. The relation invoices[2417] is part of statements[265] is not persistent if the statement table was emptied and reloaded from a CSV.
If I use the numeric autonumber pk then any process that is updating the database would need to refresh the related key numbers or by using the multiple WITH clause.
If I create a slug that is based on my 3 keys but easy to reproduce then I can use it as the key - albeit clumsily. I'm thinking of a slug along the lines:
u'%s %s %s' % (self.supplier,
self.statement_date.strftime("%Y-%m-%d"),
self.total)
This seems quite clumsy and not very DRY as I expect I may have to recreate the slug elsewhere duplicating the algorithm (maybe in an Excel formula, or an Access query)
I thought there must be a better way I'm missing but it looks like yuvi's reply means there should be, and there will be, but not yet :-(

What you're talking about it a multi-column primary key, otherwise known as "composite" or "compound" keys. Support in django for composite keys today is still in the works, you can read about it here:
Currently Django models only support a single column in this set,
denying many designs where the natural primary key of a table is
multiple columns [...] Current state is that the issue is
accepted/assigned and being worked on [...]
The link also mentions a partial implementation which is django-compositekeys. It's only partial and will cause you trouble with navigating between relationships:
support for composite keys is missing in ForeignKey and
RelatedManager. As a consequence, it isn't possible to navigate
relationships from models that have a composite primary key.
So currently it isn't entirely supported, but will be in the future. Regarding your own project, you can make of that what you will, though my own suggestion is to stick with the fully supported default of a hidden auto-incremented field that you don't even need to think about (and use unique_together to enforce the uniqness of the described fields instead of making them your primary keys).
I hope this helps!

No.
Model needs to have one field that is primary_key = True. By default this is the (hidden) autofield which stores object Id. But you can set primary_key to True at any other field.
I've done this in cases, Where i'm creating django project upon tables which were previously created manually or through some other frameworks/systems.
In reality - you can use whatever means you can think of, for joining objects together in queries. As long as query returns bunch of data that can be associated with models you have - it does not really matter which field you are using for joins. Just keep in mind, that the solution you use should be as effective as possible.
Alan

Related

Django, is filtering by string faster than SQL relationships?

Is it a major flaw if I'm querying my user's information by their user_id (string) rather than creating a Profile model and linking them to other models using SQL relationships?
Example 1: (user_id is stored in django sessions.)
class Information(models.Model):
user_id = models.CharField(...)
...
# also applies for .filter() operations.
information = Information.objects.get(user_id=request.getUser['user_id'])
note: I am storing the user's profile informations on Auth0.
Example 2: (user_id is stored in Profile.)
class Profile(models.Model):
user_id = models.CharField(...)
class Information(models.Model):
profile = models.ForeginKey(Profile, ...)
...
information = Information.objects.get(profile=request.getProfile)
note: With this method Profile will only have one field, user_id.
On Django, will using a string instead of a query object affect performances to retrieve items?
Performance is not an issue here as noted by Dirk; as soon as a column is indexed, the performance difference between data types should be negligible when compared to other factors. Here's a related SO question for more perspective.
What you should take care of is to prevent the duplication of data whose integrity you then would have to take care of on your own instead of relying on well-tested integrity checks in the database.
Another aspect is that if you do have relations between your data, you absolutely should make sure that they are accurately represented in your models using Django's relationships. Otherwise there's really not much point in using Django's ORM at all. Good luck!

Derived variables in Django Models

I am building a Django app that has a central Projects model:
class Project(models.Model):
fundamental_attrib1=models.IntegerField()
fundamental_attrib2=models.DecimalField(max_digits=10, decimal_places=2)
derived_attrib1=models.DecimalField(null=True, max_digits=10, decimal_places=2)
derived_attrib1_start=models.DateField(null=True)
derived_attrib1_end=models.DateField(null=True)
derived_attrib2=models.DecimalField(null=True, max_digits=10, decimal_places=2)
derived_attrib2_start=models.DateField(null=True)
derived_attrib2_end=models.DateField(null=True)
derived_attrib3=models.IntegerField(null=True)
derived_attrib3_start=models.DateField(null=True)
derived_attrib3_end=models.DateField(null=True)
The goal is to allow users to instantiate new projects where they can only see (and only need to) the 'fundamental' variables in the form to create/update a Project. Once they have submitted the form, I want to be calculate all the optional parameters before saving the project to the database.
In addition, most of my derived variables come in groups of three as above (value, start date, end date). Is there a better way (that makes sense) to store them in the database? Naive string example:
{'derived_attrib1':[1000,date(2017,1,1),date(2017,2,2)]}
{'derived_attrib2':[2000,date(2017,2,1),date(2017,3,2)]}
{'derived_attrib3':[ 500,date(2017,3,1),date(2017,4,2)]}
My eventual 'end goal' is, for each group:
create numpy arrays (or bring into one DataFrame?) to interpolate days between the start and end dates
linearly distribute the value across the days
plot each group as a timeseries (probably with D3.js/Bokeh or similar)
If these come in groups and can be seen as an entity, consider using ArrayField. For other backends, one could use a json representation in a TextField (or one of the json fields that work in the same way). Field choice depends on what comes in from the calculation and what your processing is most comfy with.
The choice not to do this, would be if you frequently would filter querysets on attributes of these entities, as while that's possible, it's not as fast as querying straight fields. Relations will be impossible.
A totally different approach is to use OneToOne fields to models. This creates a ton of joins for your approach, so I'm not recommending it, but it has some advantages in terms of handling each derived entity: it's fields and calculation method are independent of the model that uses them.
I would say ForeignKey in a related model would be the best bet. Then you can query Project.objects.filter(derived__start__gt=start, derived__end__lt=something). prefetch_related would only require two queries to get any amount of data from these two tables. This allows the number of properties to be infinite and you can query them any way you want.
class Derived(models.Model):
project = models.ForeignKey(Project, related_name='derived')
value=models.DecimalField(null=True, max_digits=10, decimal_places=2)
start=models.DateField(null=True)
end=models.DateField(null=True)

Can I make a dynamc number of foreign keys to a single (self) django model?

I'm currently creating an equipment management database and need to allow equipment to be associated with other equipment.
Thanks to this stackoverflow question I currently have something akin to the following (vastly simplified):
class Equipment(models.Model):
equipment_title = models.CharField()
relates_to = models.ForeignKey('self')
However, to relate a dynamic number of equipment to other equipment I think I need something like a one-to-many field that doesn't exist natively within Django, e.g. a filter housing may be associated with many filter units, and several filter housings may be associated with a machine tool.
How can I get around this? I'm not sure that it's the right place for a many-to-many field...
A ForeignKey is a one-to-many relationship, defined on the "many" side. Since your relationship is pointing to self anyway, it already does what you want.

Tastypie, Django, joining unrelated resources

My question is how do i join unrelated resources that have a similar variable
Both resources have an 8 length VARCHAR variable, both named code.
Due to how to data is constructed I cannot make any assumption that would lead to this being a foreign key relation but I do however need to join this two tables together if they have similar code values, how do I join these resources together to be displayed in tastypie/django?
class CodeDescription(models.Model):
code = models.CharField(db_column='Code', max_length=10)
description = models.CharField(db_column='Description', max_length=255)
class TechnicalDif(models.Model):
code = models.CharField(db_column='Code', max_length=10)
As you can see these tables hold the same sort of value but CodeDescription holds the details of what the code means, but doesn't necessarily have the definition for all the codes, so a foreignkey relation cannot be applied, How would i join these two tables for display using them as a tastypie resource?
The only way I think you can do this is programatically in the dehydrate method of the relevant TastyPie ResourceClass. So for each loaded object you could do a query and set an appropriate value in the bundle.
Alternatively, I suppose you could create a join table, that joins related CodeDescriptions to TechnicalDifs, and populate it programatically with a query that you run over the tables intermittently (e.g. when something changes).

How to retrieve values from Django ForeignKey -> ManyToMany fields?

I have a model (Realtor) with a ForeignKey field (BillingTier), which has a ManyToManyField (BillingPlan). For each logged in realtor, I want to check if they have a billing plan that offers automatic feedback on their listings. Here's what the models look like, briefly:
class Realtor(models.Model):
user = models.OneToOneField(User)
billing_tier = models.ForeignKey(BillingTier, blank=True, null=True, default=None)
class BillingTier(models.Model):
plans = models.ManyToManyField(BillingPlan)
class BillingPlan(models.Model):
automatic_feedback = models.BooleanField(default=False)
I have a permissions helper that checks the user permissions on each page load, and denies access to certain pages. I want to deny the feedback page if they don't have the automatic feedback feature in their billing plan. However, I'm not really sure the best way to get this information. Here's what I've researched and found so far, but it seems inefficient to be querying on each page load:
def isPermitted(user, url):
premium = [t[0] for t in user.realtor.billing_tier.plans.values_list('automatic_feedback') if t[0]]
I saw some solutions which involved using filter (ManyToMany field values from queryset), but I'm equally unsure of using the query for each page load. I would have to get the billing tier id from the realtor: bt_id = user.realtor.billing_tier.id and then query the model like so:
BillingTier.objects.filter(id = bt_id).filter(plans__automatic_feedback=True).distinct()
I think the second option reads nicer, but I think the first would perform better because I wouldn't have to import and query the BillingTier model.
Is there a better option, or are these two the best I can hope for? Also, which would be more efficient for every page load?
As per the OP's invitation, here's an answer.
The core question is how to define an efficient permission check based on a highly relational data model.
The first variant involves building a Python list from evaluating a Django query set. The suspicion must certainly be that it imposes unnecessary computations on the Python interpreter. Although it's not clear whether that's tolerable if at the same time it allows for a less complex database query (a tradeoff which is hard to assess), the underlying DB query is not exactly simple.
The second approach involves fetching additional 1:1 data through relational lookups and then checking if there is any record fulfilling access criteria in a different, 1:n relation.
Let's have a look at them.
bt_id = user.realtor.billing_tier.id: This is required to get the hook for the following 1:n query. It is indeed highly inefficient in itself. It can be optimized in two ways.
As per Django: Access Foreign Keys Directly, it can be written as bt_id = user.realtor.billing_tier_id because the id is of course present in billing_tier and needs not be found via a relational operation.
Assuming that the page in itself would only load a user object, Django can be told to fetch and cache relational data along with that via select_related. So if the page does not only fetch the user object but the required billing_tier_id as well, we have saved one additional DB hit.
BillingTier.objects.filter(id = bt_id).filter(plans__automatic_feedback=True).distinct() can be optimized using Django's exists because that will redurce efforts both in the database and regarding data traffic between the database and Python.
Maybe even Django's prefetch_related can be used to combine the 1:1 and 1:n queries into a single query, but it's much more difficult to judge whether that pays. Could be worth a try.
In any case, it's worth installing a gem called Django Debug Toolbar which will allow you to analyze how much time your implementation spends on database queries.