I have a group of related companies that share items they own with one-another. Each item has a company that owns it and a company that has possession of it. Obviously, the company that owns the item can also have possession of it. Also, companies sometimes permanently transfer ownership of items instead of just lending it, so I have to allow for that as well.
I'm trying to decide how to model ownership and possession of the items. I have a Company table and an Item table.
Here are the options as I see them:
Inventory table with entries for each Item - Company relationship. Has a company field pointing to a Company and has Boolean fields is_owner and has_possession.
Inventory table with entries for each Item. Has an owner_company field and a possessing_company field that each point to a Company.
Two separate tables: ItemOwner and ItemHolder**.
So far I'm leaning towards option three, but the tables are so similar it feels like duplication. Option two would have only one row per item (cleaner than option one in this regard), but having two fields on one table that both reference the Company table doesn't smell right (and it's messy to draw in an ER diagram!).
Database design is not my specialty (I've mostly used non-relational databases), so I don't know what the best practice would be in this situation. Additionally, I'm brand new to Python and Django, so there might be an obvious idiom or pattern I'm missing out on.
What is the best way to model this without Company and Item being polluted by knowledge of ownership and possession? Or am I missing the point by wanting to keep my models so segregated? What is the Pythonic way?
Update
I've realized I'm focusing too much on database design. Would it be wise to just write good OO code and let Django's ORM do it's thing?
Is there a reason why you don't want your item to contain the relationship information? It feels like the owner and possessor are attributes of the item.
class Company(models.Model):
pass
class Item(models.Model):
...
owner = models.ForeignKey(Company, related_name='owned_items')
holder = models.ForeignKey(Company, related_name='held_items')
Some examples:
company_a = Company.objects.get(pk=1)
company_a.owned_items.all()
company_a.held_items.all()
items_owned_and_held_by_a=Items.objects.filter(owner=company_a, holder=company_a)
items_on_loan_by_a=Items.objects.filter(owner=company_a).exclude(holder=company_a)
#or
items_on_loan_by_a=company_a.owned_items.exclude(holder=company_a)
items_a_is_borrowing=Items.objects.exclude(owner=company_a).filter(holder=company_a)
#or
items_a_is_borrowing=company_a.held_items.exclude(owner=company_a)
company_b = Company.objects.get(pk=2)
items_owned_by_a_held_by_b=Items.objects.filter(owner=company_a, holder=company_b)
#or
items_owned_by_a_held_by_b=company_a.owned_items.filter(holder=company_b)
#or
items_owned_by_a_held_by_b=company_b.held_items.filter(owner=company_a)
I think if your items are only owned by a single company and held by a single company, a separate table shouldn't be needed. If the items can have multiple ownership or multiple holders, a m2m table through an inventory table would make more sense.
class Inventory(models.Model):
REL = (('O','Owns'),('P','Possesses'))
item = models.ForeignKey(Item)
company = models.ForeignKey(Company)
relation = models.CharField(max_length=1,choices=REL)
Could be one implementation, instead of using booleans. So I'd go for the first. This could even serve as an intermediate table if you ever decide to use a 'through' to relate items to company like this:
Company:
items = models.ManyToManyField(Item, through=Inventory)
Option #1 is probably the cleanest choice. An Item has only one owner company and is possessed by only one possessing company.
Put two FK to Company in Item, and remember to explicitly define the related_name of the two inverses to be different each other.
As you want to avoid touching the Item model, either add the FKs from outside, like in field.contribute_to_class(), or put a new model with a one-to-one rel to Item, plus the foreign keys.
The second method is easier to implement but the first will be more natural to use once implemented.
Related
Been searching the web for a couple hours now looking for a solution but nothing quite fits what I am looking for.
I have one model (simplified):
class SimpleModel(Model):
name = CharField('Name', unique=True)
date = DateField()
amount = FloatField()
I have two dates; date_one and date_two.
I would like a single queryset with a row for each name in the Model, with each row showing:
{'name': name, 'date_one': date_one, 'date_two': date_two, 'amount_one': amount_one, 'amount_two': amount_two, 'change': amount_two - amount_one}
Reason being I would like to be able to find the rank of amount_one, amount_two, and change, using sort or filters on that single queryset.
I know I could create a list of dictionaries from two separate querysets then sort on that and get the ranks from the index values ...
but perhaps nievely I feel like there should be a DB solution using one queryset that would be faster.
union seemed promising but you cannot perform some simple operations like filter after that
I think I could perhaps split name into its own Model and generate queryset with related fields, but I'd prefer not to change the schema at this stage. Also, I only have access to sqlite.
appreciate any help!
Your current model forces you to have ONE name associated with ONE date and ONE amount. Because name is unique=True, you literally cannot have two dates associated with the same name
So if you want to be able to have several dates/amounts associated with a name, there are several ways to proceed
Idea 1: If there will only be 2 dates and 2 amounts, simply add a second date field and a second amount field
Idea 2: If there can be an infinite number of days and amounts, you'll have to change your model to reflect it, by having :
A model for your names
A model for your days and amounts, with a foreign key to your names
Idea 3: You could keep the same model and simply remove the unique constraint, but that's a recipe for mistakes
Based on your choice, you'll then have several ways of querying what you need. It depends on your final model structure. The best way to go would be to create custom model methods that query the 2 dates/amount, format an array and return it
I have three models, let's take an imaginary example:
class Entity(models.Model):
name = models.CharField()
class EntityAssociation(models.Model):
buddy1 = models.ForeignKey(Entity, related_name='+')
buddy2 = models.ForeignKey(Entity, related_name='+')
class EntityPhoto(models.Model):
entity = models.ForeignKey(Entity, null=True)
association = models.ForeignKey(EntityAssociation, null=True)
title = ...
We have some people (Entity), that can share personal photos of themselves. We also have some relations between entities (represented by EntityAssociation) that can also share photos of them together.
For a single entity, I can retrieve all the photo associated to an entity, either directly or through an association, doing so:
obj = Entity.objects.last()
EntityPhoto.objects.filter(
Q(entity=obj) | Q(association__buddy1=obj) | Q(association__buddy2=obj)
)
What I want is being able to prefetch all the photos of a set of entities selected. A typical use-case would be:
for entity in Entity.objects.all().prefetch(???):
print(entity.name, 'has', len(entity.photos_prefetched), 'photos')
print([x.title for x in entity.photos_prefetched])
And this should be returning all the photos. A solution with three queries (Entity listing, prefetch through entity, prefetch through association attr ; two would be perfect) would satisfy me but the more important is to be able to iterate through a single list, on each entity
I tried to look the internal code of Prefetch but it looks like a prefetch is tied to a lookup, plus I don't know how to make the Q query in this case (what should be the right operand in Q(entity__in=...)?)
Notice: The point here is not about refactoring the database structure (EntityAssociation is used for a plenty of other things so it can't be reduced to a M2M of EntityPhoto for example.) but optimizing this specific use-case, if possible.
I'm currently playing a bit with Prefetch. In my opinion it's in a pretty good shape and quite powerful.
My approach to your problem would probably be something like:
entities = Entity.objects.all().prefetch(Prefetch(
'entity_photo__set',
EntityPhoto.objects.filter(
Q(entity=obj) | Q(association__buddy1=obj) | Q(association__buddy2=obj
)
to_attr="photos_prefetched",
))
You can then access those photos with for entity in entities: entity.photos_prefetched.
If this is not working, the problem might be that you're referencing back (Q(entity=obj)) to the actual entity. Not sure if this is properly possible. I'm having trouble with references back to the object, might be a bug in Django.
For example:
class Contact(models.Model):
contacts = models.ManyToManyField('self', through='ContactRelationship', symmetrical=False)
What does the symmetrical=False parameter do?
When should it be left as True, and when should it be set as False?
How does this settings affect the database (does it create extra columns etc)?
Let's say you have two instances of Contact, John and Judy. You may decide to make John a contact of Judy. Should this action also make Judy a contact of John? If so, symmetrical=True. If not, symmetrical=False
Here is what is says in the documentation:
Only used in the definition of ManyToManyFields on self. Consider the following model:
from django.db import models
class Person(models.Model):
friends = models.ManyToManyField("self")
When Django processes this model, it identifies that it has a ManyToManyField on itself, and as a result, it doesn’t add a person_set attribute to the Person class. Instead, the ManyToManyField is assumed to be symmetrical – that is, if I am your friend, then you are my friend.
By default, the value of symmetrical is True for Many to Many Field which is a bi-directional relationship.
Using a through table (symmetrical=False):
But you can also imagine a situation where you don't need this type of relationship so you can add symmetrical=False. And, this can be achieved by using a through table because by default symmetrical is False if you use a through table:
Recursive relationships using an intermediary model are always defined as non-symmetrical – that is, with symmetrical=False – therefore, there is the concept of a “source” and a “target”. In that case 'field1' will be treated as the “source” of the relationship and 'field2' as the “target”.
So you can imagine a situation where you do need the direction i.e. let's say there is a Node model and it has a relationship with itself using a through table. If we didn't have the requirement of direction here we could go with the example shown earlier. But now we also need a direction from one node to another where one being source and another one being target and due to nature of this relationship it cannot be symmetrical.
This are simplified models to demonstrate my problem:
class User(models.Model):
username = models.CharField(max_length=30)
total_readers = models.IntegerField(default=0)
class Book(models.Model):
author = models.ForeignKey(User)
title = models.CharField(max_length=100)
class Reader(models.Model):
user = models.ForeignKey(User)
book = models.ForeignKey(Book)
So, we have Users, Books and Readers (Users, who have read a Book). Thus, Reader is basically a many-to-many relationship between Book and User.
Now let's say, the current user reads a book. Now, I'd like to update the number of total readers for all books of this book's author:
# get the book (as an example pk=1)
book = Book.objects.get(pk=1)
# save Reader object for this user and this book
Reader(user=request.user, book=book).save()
# count and save the total number of readers for this author in all his books
book.author.total_readers = Reader.objects.filter(book__author=book.author).count()
book.author.save()
By doing so, Django creates a LEFT OUTER JOIN query for PostgreSQL and we get the expected result. However, the database tables are huge and this has become a bottleneck.
In this example, we could simply increase the total_readers by one on each view, instead of actually counting the database rows. However, this is just a simplified model structure and we cannot do this in reality here.
What I can do, is creating another field in the Reader model called book_author_id. Thus, I denormalize data and can count the Reader objects without having PostgreSQL making the LEFT OUTER JOIN with the User table.
Finally, here's my question: Is it possible to create some sort of database index, so that PostgreSQL handles this denormalization automatically? Or do I really have to create this additional model field and redundantly store the author's PK in there?
EDIT - to point out the essential question: I got several great answers, which work for a lot of scenarios. However, they don't solve this actual problem. The only thing I'd like to know, is if it's possible to have PostgreSQL handle such a denormalization automatically - e.g. by creating some sort of database index.
Sometimes, this query can serve better:
book.author.total_readers = Reader.objects.filter(book__in=Book.objects.filter(author=book.author)).count()
That will generate query with sub-query, sometimes it will have better performance that query with join. You even go further and end up creating 2 queries separately:
book.author.total_readers = Reader.objects.filter(book_id__in=Book.objects.filter(author=book.author).values_list('id', flat=True)).count()
That will generate 2 queries, one will retrieve list of all book IDs for that author and second will retrieve count of reads for books with ID in that list.
Good solution also may be to create some batch task that will run for example once per hour and count up all reads, but that way you will end up with not live refreshing count of reads.
You can also create celery task that will run just after read is created to generate new value for author. That way you won't have long response time and delay from creating read to counting it up won't be so long.
It's always way better to solve bottlenecks of this sort with good design and maybe a little bit of caching rather than duplicating data in the way you suggest. The total_readers field is data you should generate instead of recording.
class User(models.Model):
username = models.CharField(max_length=30)
#property
def total_readers(self):
cached_value = caching_client.get("readers_"+self.username, None)
if cached_value is None:
cached_value = self.readers()
caching_client.set("readers_"+self.username,
cached_value)
return cached_value
def readers(self):
return Reader.objects.filter(book__author__user=self).count()
There are libraries that do the caching via decorators but I felt it was a pattern you would benefit from seeing expressly. You can also attach a TTL to the cache so that you insure that the value can't be wrong for longer than TTL. You can also regenerate the cache upon creation of a Reader object.
You might actually get some mileage with declaring an m2m and defining through relationships but I have no experience of it.
Say I have a model that is
class Bottles(models.Model)
BottleCode = models.IntegerField()
class Labels(models.Model)
LabelCode = models.IntegerField()
How do I get a queryset of Bottles where the BottleCode and LabelCode are equal? (i.e. Bottles and Labels with no common Code are excluded)
It can be achieved via extra():
Bottles.objects.extra(where=["Bottles.BottleCode in (select Labels.LabelCode from Labels)"])
You may also need to add an app name prefix to the table names, e.g. app_bottles instead of bottles.
Though #danihp has a point here, if you would often encounter queries like these, when you are trying to relate unrelated models - you should probably think about changing your model design.