I am working on a django model and not being a database expert I could use some advice. I essentially have a model which contains a many to many relationship with another model. But I need to store unique values for each relationship each time I include something.
So for instance in chemistry you may have many elements that include hydrogen, but each element has a unique amount of hydrogen in it. So for instance a water entry would be connected to hydrogen and oxygen and the amount would be two hydrogen atoms and one oxygen.
I want hydrogen and water in this scenario to be stored in the database as elements, so I can query against them for other elements using them.
What is the best way to model this?
Thanks!
Read the documentaion here and pay close attention to the Beatles example, it's exactly what you need.
Person -> Element
Group -> Chemical_Compound
Membership -> Element_2_Chemical
Element_2_Chemical should have an int field which details how many elements you have in each chemical compound.
In your metaphor, you say "I want hydrogen and water in this scenario to be stored in the database as elements, so I can query against them for other elements using them."
Does it mean that "water" may be on any part of the relationship you are modeling? Do "water" relate to "hydrogen" in the (almost) same way as "milk" relates to "water"?
If the answer is Yes, then you should use a directed-acyclic-graph model (hopefully, you won't have cycles in your relationship: A->B->C->A). Look into the django-dag ( http://pypi.python.org/pypi/django-dag/ ) and django-treebeard-dag ( http://pypi.python.org/pypi/django-treebeard-dag/0.2 ) packages.
If the answer in No, so yo have a clear distinction between what's a "container" and what's a "containee", use a normal many-to-many rel between two different models, like the "Membership" example in django documentation ( https://docs.djangoproject.com/en/dev/topics/db/models/#extra-fields-on-many-to-many-relationships ).
In any case you'll have to add more info to the "edge" of the relationship.
Following strictly your chemical metaphor, you are maybe not modeling enough information, because some molecules have the same composition but different structure (they are called "isomers"). For instance the pentane, the 2-methylbutane and the 2,2-dimethylpropane have all five carbons and twelve hydrogens, but they are very different one another...
With this I am saying that when you are doing an "enhanced many-to-many" it's generally a complex model, so take care of not leaving anything out of it.
Related
Situation: I have a Books set. Book can be one of the types: "Test", "Premium" and "Common". Data proportional: 2%, 15%, 83%. Amount query per time unit (in percent): 40%, 20%, 40%
I see some ways for resolve it in database:
Boolean: is_test, is_premium. If we need only "Tests" book: Book.objects.filter(is_test=True). It is can be a proxy model, for example. Analogy for premium books;
Separate Tables: books_test, books_premium, books_common.
Choice field: string in ['Test', 'Premium', 'Common'];
Combine 1 and 2: books_test table and books table with 'is_premium' attribute.
And we need optimally querying this data! All three Book variants need in one page. Exist queryset combinations: only tests, only common, common + premium, only premium.
If we use 1,3 variant: 1 endpoint with specific filter;
If we use 2 variant: one of the tree endpoints without filters (frontend should know what kind endpoint use). Or we can create one endpoint with some conditions and check by backend. Anyway: need extend logic;
Which way is more correct and why?
If you need to mix different types on one page, separate models/tables would complicate things for no good reason. The same goes for mapping more than two exclusive states to a combination of boolean fields.
This leaves you with a choice field or a separate BookType model containing the choices.
I'm currently designing my data base using postgresql with Django and I was wondering: What is best practice - having several instances of the same model with the same value or a many to many relation ship?
Let me elaborate. Let's say I'm designing a store. The store sells items. Items can have one or many statuses (e.g. ordered, shipped, delivered, paid, pre-ordered etc.).
What would be a better practice:
Relating the items to their status via a many-to-many relationship, which will lead to one status having hundreds of thousand and later millions of relations? Will so many relations become problematic?
Or is it better for each item to have a foreignkey to their statuses? So that each status only has one item. And if I would like to query all the items that have the same status (e.g. shipped), I would have to iterate over all statuses with a common name.
What would be better, especially for the long term?
I would recommend going with a many-to-many relationship.
Hundreds of thousands or even millions of relations should not be a problem. The many-to-many relationship is stored as a table with id, item_id, status_id. SQL will be performant at querying the table either by status_id or item_id even if the table gets big. This is exactly the kind of thing it was built to handle.
Let me elaborate. Let's say I'm designing a store. The store sells
items. Items can have one or many statuses (e.g. ordered, shipped,
delivered, paid, pre-ordered etc.).
If many people will have this many itens you should use manytomany relations, better let django handle with this "third table", since this table just hold ids you can interate over them using reverse lookup, i do prefer using many to many instad of simple foreignkeys.
In your case, who you will handle when your User will hold many itens? like what if my User buy one potato and 2 bananas? you will duplicate the tuple in your User Table to tell "here he have the potato and in this second one he have the banana"? so you will be slave of Disctinct attribute while you still dirtying your main table User
...
class Item(models.Model):
...
class User(models.Model):
items = models.ManyToMany(Item)
So when i query my Item and my User will only bring attributes related to them... while if you use item inside of User Model you will have multiple instances of same user.
So instead of use User.items.all() you will use User.objects.filter(id=id)and them items = [user.item for user in User.objects.filter(id=id)]
Look how complex this get and makeing your database so dirty
I would like to model a relationship between a Story class and a Series class.
(I.e. a trilogy of novels)
The relationship is a "one to many" (a series can contain many Stories but a Story can only be part of one Series).
Model-wise this could simply be solved by a foreign key on Story,
part_of = models.ForeignKey(Series
, on_delete=models.CASCADE
, related_name='contains_story')
But I would like a sequence number as an attribute of this relationship.
i.e. (1:The long Earth, 2:The long war, 3:The long Mars, ...).
I could make it an attribute of Story but that's not clean, a Story not part of a Series should not have a sequence number.
In a "many-to-many" this can be solved using the "through" option.
by specifying a class and adding attributes to that class.
part_of = models.ManyToManyField(Series, through='SeriesPart')
But "part of a series" is not a "many-to-many" relationship and I want to avoid modelling it like this and having to restrict it in code, so how should I solve this best?
I'm not sure your objection makes much sense. The reason why you might store an extra attribute on the through table of a many-to-many relationship is precisely because each side of the relationship can have multiple items, and the attribute value is only relevant to one specific combination. (In the case of the example in the Django docs, John was in the Quarrymen before he was in the Beatles, so there are separate joined_date values for John<->Quarrymen, John<->Beatles, and Paul<->Beatles.)
In your case, a story can only be part of one series. There is no other position for The Long Earth other than as part 1 of the Pratchett/Baxter series; it can't simultaneously be part 1 of that but also part 2 of something else. So there's no reason not to store the series number on the story model itself. Stories that are not part of a series can simply leave that blank, just like they leave the FK to Series blank.
I have 4 tables in my database. The image below shows the rows and columns with the name of the table enclosed in a red box. 4 tables total. Am I going about the relationship design correctly? This is a test project and I am strongly assuming that I will use a JOIN to get the entire set of data on one table. I want to start this very correctly.
A beginner question but is it normal that the publisher table, for example, has 4 rows with Nintendo?
I am using Django 1.7 along with PostgreSQL 9.3. I aim to keep simple with room to grow.
Basically you've got the relations back-to-front here...
You have game_id (i.e. a ForeignKey relation) on each of publisher, developer and platform models... but that means each of those entities can only be related to a single game. I'm pretty sure that's not what you want.
You need it the other way around... instead put three foreign keys onto the game model, one each for publisher, developer and platform.
A ForeignKey is what's called a many-to-one relation. In this example I think what you want is for 'many' games to be related to 'one' publisher. Same for developer and platform.
is it normal that the publisher table, for example, has 4 rows with Nintendo?
No, that's is an example of why you have it backwards. You should only have a single row for each publisher.
yes you are correct in saying that something is wrong.
First of all those screen shots are hard to follow, for this simple example they could work but that is not the right tool, pick up pen and paper and sketch some relational diagrams and think about what are the entities involved in the schema and what are their relations, for example you know you have publishers, and they can publish games, so in this restricted example you have 2 entities, game and publisher, and a relation publish among them (in this case you can place a fk on game if you have a single publisher for a game, or create an intermediary relation for a many to many case). The same point can be made for platform and games, why are you placing an fk to game there, what will happen if the game with id 2 will be published for nintendo 64 ? You are making the exact same mistake in all the entities.
Pick up any book about database design basics, maybe it will help in reasoning about your context and future problems.
In django I have three models:
SimpleProduct
ConfigurableProduct Instead of showing several variations of SimpleProducts, the user will see one product with options like color.
GroupProduct - Several SimpleProducts that are sold together.
First I'm creating all the SimpleProducts, then I create ConfigurableProducts from several products that are variations on the same product and last GroupProducts which are combiniations of several SimpleProducts.
When a user navigate to a category I need to show him all the three types. If a SimpleProduct is part of a ConfigurableProduct I don't want to show it twice.
How do I make the query? Do I have to create three several queries?
How do I use pagination on three models at the same time?
Can I somehow use inheritance?
Thanks
I think this question is tough to answer without understanding your business logic a little more clearly. Here are my assumptions:
Configurable options are ad hoc, i.e., you sell balls in red, blue, and yellow, shirts in small, medium, and large, etc. There is no way to represent these options abstractly because they don't transcend categories. (If they did, your database design is all wrong. If everything had custom color options, you would just make that a column in your database table.)
Each configuration option has a pre-existing business identity at your company. There's some sku associated with red balls or something like that. For whatever reason, it is necessary to have a database row for each possible configuration option. (If it isn't, then again, you're doing it all wrong.)
If this is the case, my simplest recommendation would be to have some base class that all products inherit from with a field: representative_product_id. The idea is that for every product, there is a representative version that gets shown on the category page, or anywhere else in your catalog. In your database, this will look like:
Name id representative_id
red_ball 1 1
blue_ball 2 1
green_ball 3 1
small_shirt 4 4
medium_shirt 5 4
large_shirt 6 4
unique_thing 7 7
As for django queries, I would use F objects if you have version 1.1 or later. Just:
SimpleProduct.objects.filter(representative_id=F('id'))
That will return a queryset whose representative ids match their own ids.
At this point, someone will clamor for data integrity. The main condition is that representative_id must in all cases point to an object whose representative_id matches its id. There are ways to enforce this directly, such as with a pre_save validator or something like that. You could also do effectively the same thing by factoring out a ProductType table that contains a representative_id column. I.e.:
Products
Name id product_type
_________________________________
red_ball 1 ball
blue_ball 2 ball
green_ball 3 ball
small_shirt 4 shirt
medium_shirt 5 shirt
large_shirt 6 shirt
unique_thing 7 thing
Types
Name representative_id
_______________________________
ball 1
shit 4
thing 7
This doesn't replace the need to enforce integrity with some validator, but it makes it a little more abstract.
Go with Django's multi-table inheritance, with a base class you won't instanciate directly. The base class still has a manager you can run queries against, and that will contain the base attributes of any subclass instance.
To tackle your question about configurable products that must not be displayed redundantly, I think you have two options:
Make configurable products a multiple choice of ConfigurableProductChoice (unrelated to SimpleProduct). Have the ConfigurableProductChoice extend the ConfigurableProduct. That way you'll have a single ConfigurableProduct in your results and no redundancy.
Make configurable products be associated to various options, and design a rule to compute the price from what options are selected. A simple addition would be fine. Your product IDs will need to encode what options are selected. You still have no redundancy, because you didn't involve SimpleProduct.