Django has some very clear conventions for naming models:
do so in the singular
describe the object the model represents
using capWords convention etc.
When you are using a 'through' model, to describe a many-to-many relationship however, you are no longer describing an object, but a relationship between objects.
Having googled a bit, I can't find any guidance on naming 'through' models. The example used in the django docs (Musician and Band) is named Membership. This is kind of perfect because membership exactly describes the relationship between the musician and the band. But in so many other cases there doesn't seem to be a word (or even phrase) to describe such a relationship.
Take for example, the other situation used in the django docs (for a 'normal' many-to-many field) of Pizza and Topping. There doesn't seem to be a good word to describe how a pizza relates to a topping; and if I therefore needed a through field to add additional information (e.g. maybe I have a primary topping and a secondary topping) I end up with a naming difficulty.
In practice there are two (maybe more, but two that I can think of) ways to procede:
Call the through model something along the lines of ThingOneThingTwoRelationship e.g. PizzaToppingRelationship. I guess this works, but it's kind of ugly and verbose.
Try and name the field after the additional info it stores, e.g. ToppingSignificance. Less ugly than option 1, but has other drawbacks. For one, if the model grows to contain additional information the name is no longer particularly descriptive. If we take the 'band' example from the django docs. Imagine if we started with just the joining_date, and we called the Membership model, MemberJoingDate as that model grew to include, reason and leaving_date, the name would no longer be apt.
So what am I actually asking...
Are there any known conventions (not opinions) for naming through fields. I'm guessing there's nothing official or I would have found it on the django site, but are there any conventions that are just standards that are generally accepted by folk that have been doing this a while.
Failing that, are there any django style guides that discuss this (two-scoops etc. - if I have my copy available I'd look it up)
Failing either of the above, are there any conventions that could be borrowed from general relational database parlance.
... yes, I know - naming things is hard.
Related
This question is fundamentally similar to these previous questions:
Django access to subclasses items from abstract class
How to query abstract-class-based objects in Django?
I am posting this as a new, separate question because:
I have minor additional considerations that aren't addressed in the previous questions
The previous questions are relatively old, and if it's the case that the correct answer has changed in recent times, I wonder if maybe those questions haven't been visible enough (given that they have accepted answers) to get the attention of those who might know about such potential changes.
With that in mind, I'll take the question from the top and define it fully and concretely - and I leave it to the hive to determine if a close-duplicate is in order.
Background
Say I am constructing models to represent parts of a building. I split the building into logical classes:
Floor
BuildingSpace(ABC)
Office(BuildingSpace)
CommonArea(BuildingSpace)
Goal
Under Floor, I want methods that can retrieve all buildingspaces - or, either of its two subclasses separately:
from typing import Type
class Floor(models.Model):
def getAllSpaces():
# return all objects that satisfy Type[BuildingSpace]
def getAllOffices():
# return all objects that satisfy strictly and only Type[Office]
def getAllCommonAreas():
# return all objects that satisfy strictly and only Type[CommonArea]
Possible solutions
django-model-utils looks like it can support this kind of a query out-of-the-box with its InheritanceManager and the .select_subclass() method -- but, crucially, requires BuildingSpace to be concrete, so that leaves this solution with having to go with multi-table inheritance. Which I understand amplifies database load for each query, so I looked into making the subclasses proxies in order to mitigate that, but InheritanceManager doesn't support proxies. When all is said and done, django-model-utils look to me like it unavoidably opens me to multi-table inheritance penalties at query time.
django-polymorphic also supports this out-of-the-box as far as I have been able to glean, using .instance_of(subclass). Purely from a coding point-of-view, this approach looks very clean and easy to use. But it also looks to come with database performance considerations, and making it admin-panel compliant looks non-trivial at a first, superficial glance.
Natively, it looks django can do this in some roundabout way, but I've seen claims that achieving the same functionality as described above with a native QuerySet.filter() approach is worse performance-wise than both of the above extensions.
A final alternative solution I've briefly considered, that I assume will work natively without creating database considerations (but does require a slight redesign) - is to access the subclass managers directly, and then have the desired outcome of getAllSpaces() implemented via a QuerySet.Union-type of approach.
Almost-MRE
Naïve setup of how I had imagined to be able to use the code:
class BuildingSpace(models.Model):
floor = models.ForeignKey('Floor',
on_delete=models.CASCADE,
related_name="interiors")
class Meta:
abstract = True
class Floor(models.Model):
def _InteriorManager(self): # get the default manager of BuildingSpace
return self.interiors
def GetAllInteriors(self):
return self._InteriorManager().all() # get the full Type[BuildingSpace] queryset, but this isn't supported in native django
def GetOffices(self):
return self._InteriorManager().instance_of(Office) # django-polymorphic
def GetCommonAreas(self):
return self._InteriorManager().select_subclasses(CommonAreas).all() # django-model-utils
Question
I'm hoping to get answers that can weigh in on the following factors:
is there any significant difference in performance between django-model-utils, django-polymorphic and some other best-case QuerySet.filter()-based approach for the cases described here (and potentially, the linked questions at the top)
does either extension implicate any other consideration that is worth noting (ease of use, extensibility, how additional filtering is done, etc)
would my "final alternative solution" in the end maybe work better on all accounts (performance, ease of use, extensibility) if it is the case that the use-cases I need solved are never more complex than the concrete code examples I've provided
I have no insights as to the database performance topic as of yet, but I will say this:
django-polymorphic is really smooth to use. Minimal code adaptation required, and the syntax is both short and intuitive. The perceived difficulty of making it compliant with the admin-panel was a smokescreen, at least as long as you do basic, straightforward subclassing.
For anyone coming this way with similar troubles, don't hesitate to try it. You can't use it on abstract classes, as mentioned, but unless you have really particular needs it does look like having a concrete superclass and just using this library is a whole lot easier than jerry-rigging a manual solution similar to the one I described in the question.
I recently came across this paragraph in the Django docs on the related_name attribute of the ForeignKey field:
If you’d prefer Django not to create a backwards relation, set related_name to '+' or end it with '+'. For example, this will ensure that the User model won’t have a backwards relation to this model:
user = models.ForeignKey(
User,
on_delete=models.CASCADE,
related_name='+',
)
Under what circumstances would you want to do this?
I recall being puzzled by that as well. However, in the years since I've found myself occasionally doing it. If you know you'll never need the related manager, there are a few minor advantages to telling Django not to create it.
It serves as documentation that you're not using that relation. That could be helpful to future readers. That's the main reason I do it, to say to myself or others in the future: "This design does not envision using this relation, so pause to think about it if you think you need it."
It avoids cluttering the namespace of the target instance. Some people care about this.
It avoids the overhead of creating the unneeded manager, which should theoretically improve performance. In the absence of any benchmarks I would expect any such improvement to be very minor.
To put it differently, normal good coding practice is to not add code that you don't need. This option allows you to keep Django from adding code that you don't need.
Now, I still don't know why you'd want to "end it with +" instead of just using "+"!
When talking about relational databases, it seems that most people refer to the primary and foreign key 'relations' as the reason for the 'relational database' terminology.
This is causing me considerable confusion because the textbook linked below states explicitly "A common misconception is that the name "relational" has to do with relationships between tables (that is, foreign keys). Actually, the true source for the model's name is the mathematical concept relation. A relation in the relational model is what SQL calls a table."
http://www.valorebooks.com/textbooks/training-kit-exam-70-461-querying-microsoft-sql-server-2012-microsoft-press-training-kit-1st-edition/9780735666054#default=buy&utm_source=Froogle&utm_medium=referral&utm_campaign=Froogle&date=11/12/15
Furthermore the next source explicitly refers to the tables as the relations and not the primary/foreign keys.
https://docs.oracle.com/javase/tutorial/jdbc/overview/database.html
However it seems common knowledge almost anywhere else I look or read that the primary and foreign keys are the relations.
Does anyone have a reason for the inconsistency?
Foreign key constraints are a kind of relation - a subset relation - but these aren't the relations from which the model derives its name. Rather, the relations of the relational model refer to finitary relations. Ted Codd wrote in his 1970 paper A Relational Model of Data for Large Shared Data Banks that "The term relation is used here in its accepted mathematical sense. Given sets S1, S2, ... Sn (not necessarily distinct), R is a relation on these n sets if it is a set of n-tuples each of which has its first element from S1, its second element from S2, and so on." Thus, he was describing a structure which can be represented by a table, if we follow some rules like ignoring duplicate rows and the order of rows (it's a set, after all).
Another common misunderstanding is that foreign key constraints represent relationships between entities. They don't. Relationships are represented as sets/tables of rows of associated values. The keys of two or more entities will be recorded together in a row, whether it's in an "entity table" or a "relationship table". Foreign key constraints only enforce integrity, they don't link entities or tables. Tables can be joined on any predicate function, foreign key constraints play no role here.
Most people learn database concepts from blogs, tutorials and answers ranked by popularity. Most people have never read a decent database book, let alone papers by the inventors and students of the relational model of data. Most programmers and corporations want to get the product released and have little time or appreciation for logic, theory and philosophy. It's an inherently complicated field - see Bill Kent's book Data and Reality for an exploration of this complexity. Thus, most of what you'll find on the internet are half-truths at best as people try to make sense of a difficult topic.
People are familiar with records and pointers, due to their prevalence in mainstream programming languages, and they certainly look and sound a lot like entities and relationships. If entities are represented by tables/records, attributes by fields/columns, then 1-to-1 / 1-to-many relationships between entities must be an association between records/tables, right? It's a simple idea, and that makes it difficult to correct. The popularity of object/relational mapping and object-oriented domain models derive from this simple idea (and from well-spoken and sociable authors, unlike the surly attitudes of some relational proponents) but also further entrenches it.
Peter Chen (author of The Entity-Relationship Model - Toward a Unified View of Data made some effort to be rigorous, distinguishing "entity relations" and "relationship relations". In his view, entities were real-world concepts which were represented in a database as values, and described via association of values in rows. Relationships between entities were similarly represented by association of values in rows. The E-R model's distinction between relationships and attributes is somewhat redundant (attributes are just binary relationships) and there's little benefit in distinguishing entity tuples from relationship tuples. In fact, I believe it serves to reinforce the confusion. It's superficial similarity to the older network model helped its adoption but also served to maintain the latter, as developers adopted new terminology while maintaining old practices.
Object-role modeling (aka NIAM, by Sjir Nijssen and Terry Halpin) does away with attributes and focuses on domains, roles and relations. It's more elegant than E-R and much closer to a true relational model, but its strengths (logical, comprehensive, move away from the network model) is also its weaknesses (learning curve, more complicated diagrams, less amenable as a vehicle for familiar techniques).
Ted Codd remarked in the paper mentioned above that "The network model, on the other hand, has spawned a number of confusions, not the least of which is mistaking the derivation of connections for the derivation of relations." This is as true today as it was then. The relational model which he described has since been built on by many others, including Chris Date whose book An Introduction to Database Systems is one of the most comprehensive sources on the topic.
I'm naming all these authors because one more opinion on either side isn't going to clear up your confusion. Rather, go to the sources and study them for yourself. Yes, it's hard work, but your efforts will be repaid in the quality of understanding you'll gain.
I have a Django project. It was always intended to have two separate forward facing URLs.
One was for teachers, and one was for students.
A teacher can post assignments, wait for students to do it, and then review the work.
Both sites have very different functionality.
Currently, having the code be in a single project is becoming increasingly hairy. Students can signup in a lazy way (i.e. after doing work) but teachers cannot. I have complicated logic to make sure that the user is the correct role when signing up and showing views. Teachers and students each have a different kind of Profile (so I currently can't use AUTH_PROFILE_MODULE). I don't care if I have seperate Tables for the two kinds of users. In fact, I prefer that.
If I were to split this into two projects, I believe a lot of things would be conceptually simpler. The only problem is that, when a teacher submits an assignment, I would need to need to post that assignment to the student site somehow. But synchronizing the content would be much simpler than keeping two types of users in the same code. (The synchronization only happens in two places, and besides that the two sites have very different functionality and models and apps.)
Should I break this into two projects? If so, what is a secure way to share data from one Django site to another?
I think splitting it into two projects is going to create far more problems then it solves and (from my limited knowledge of the app) I don't think that it would make sense to do so - the two users are a part of a single homework submission and marking system/application and therefore they should be developed as such. Just because it might make your job easier doesn't mean it is the correct move.
Are you using inheritance? Have you written or utilized extra permissions? It sounds like you could clean up your two conceptually different profiles using decorators, middleware and a custom AUTH_PROFILE_MODULE implementation.
Is there any naming convention for "created" and "last edit" dates in Django?
ie. in Symfony Framework this fields are named by default:
created_at
updated_at
Indeed, as far as I can tell there's no canonical convention for Django, but I really like the Rails convention (which I suppose also inspired Symphony):
created_at for DateTime fields
created_on for Date fields
created works fine for creation dates, but as soon as you have more ambiguous fields like activated, it becomes a problem. Is it a boolean or a date/datetime? Naming conventions exist to help developers understand code faster and waste less time with unimportant decisions. That's the philosophy behind the Convention over Configuration paradigm, which is big in the Rails community but not as much in Django's unfortunately. This confusion I mentioned for example is typical and that's why I prefer to always be extra clear:
If it's a boolean is_activated
If it's datetime activated_at
If it's just a date activated_on
I've heard people say that "you shouldn't mix field names with data types" but it seems like a rather empty tip in my opinion and I've never heard any concrete argument behind it. If we want to optimize code readability and decision making then I really think explicit naming conventions are the way to go.
In Django origin models this fields are named based on Model type ie.
auth.User: date_joined
comments.Comment: submit_date
So probably we should follow this convention.
I don't think there's something like a canonical way of naming such things in Django. Some parts are well covered by PEP8, mostly because this sort of thing is out of scope of Django, since it's much more a matter of style (and maybe house conventions).
That said, I think it's pretty common to name these fields as created_at and updated_at, and I personally follow this convention when writing my own code. I advise to avoid names like created or updated since they're ambiguous (although some popular libs use that style): are they booleans or something else? is_created/is_updated, if you need those, are better options.
I prefer created and updated without the _at suffix. I don't know of any "canonical" preference for naming the fields.
For what it is worth, I think Rails uses created_at / created_on and updated_at / updated_on.
Django Model Utils has a model mixin for this, naming them created and modified -- I recommend using their model mixin to easily standardize across of your models and projects.