Django ORM and SQL inner joins - django

I am trying to get all Horse objects which fall within a specific from_date and to_date range on a related listing object. eg.
Horse.objects.filter(listings__to_date__lt=to_date.datetime,
listings__from_date__gt=from_date.datetime)
Now as I understand this database query creates an inner join which then enables me to find all my horse objects based on the related listing dates.
My question is how this exactly works, it probably comes down to a major lack of understanding in how inner joins actually work. Would this query need to first 'check' each and ever horse object first to ascertain whether or not it has a related listing object? I'd imagine this could prove to be quite inefficient because you might have 5million horse objects with no related listing object yet you still would have to check each and every one first?
Alternatively I could start with my Listings and do something like this first:
Listing.objects.filter(to_date__lt=to_date.datetime,
from_date__gt=from_date.datetime)
And then:
for listing in listing_objs:
if listing.horse:
horses.append(horse)
But this seems like a rather odd way of achieving my results too.
If anyone could help me understand how queries work in Django and which is the most efficient way to go about doing such a query it would be a great help!
This is my current model setup:
class Listing(models.Model):
to_date = models.DateTimeField(null=True, blank=True)
from_date = models.DateTimeField(null=True, blank=True)
promoted_to_date = models.DateTimeField(null=True, blank=True)
promoted_from_date = models.DateTimeField(null=True, blank=True)
# Relationships
horse = models.ForeignKey('Horse', related_name='listings', null=True, blank=True)
class Horse(models.Model):
created_date = models.DateTimeField(null=True, blank=True, auto_now=True)
type = models.CharField(max_length=200, null=True, blank=True)
name = models.CharField(max_length=200, null=True, blank=True)
age = models.IntegerField(null=True, blank=True)
colour = models.CharField(max_length=200, null=True, blank=True)
height = models.IntegerField(null=True, blank=True)

The way you write your query really depends on what information you want back most of the time. If you are interested in the horses, then query from Horse. If you're interested in listings then you should query from Listing. That's generally the correct thing to do, especially when you're working with simple foreign keys.
Your first query is probably the better one with regards to Django. I've used slightly simpler models to illustrate the differences. I've created an active field rather than using datetimes.
In [18]: qs = Horse.objects.filter(listings__active=True)
In [19]: print(qs.query)
SELECT
"scratch_horse"."id",
"scratch_horse"."name"
FROM "scratch_horse"
INNER JOIN "scratch_listing"
ON ( "scratch_horse"."id" = "scratch_listing"."horse_id" )
WHERE "scratch_listing"."active" = True
The inner join in the query above will ensure that you only get horses that have a listing. (Most) databases are very good at using joins and indexes to filter out unwanted rows.
If Listing was very small, and Horse was rather large, then I would hope the database would only look at the Listing table, and then use an index to fetch the correct parts of Horse without doing a full table scan (inspecting every horse). You will need to run the query and check what your database is doing though. EXPLAIN (or whatever database you use) is extremely useful. If you're guessing what the database is doing, you're probably wrong.
Note that if you need to access the listings of each horse then you'll be executing another query each time you access horse.listings. prefetch_related can help you if you need to access listings, by executing a single query and storing it in cache.
Now, your second query:
In [20]: qs = Listing.objects.filter(active=True).select_related('horse')
In [21]: print(qs.query)
SELECT
"scratch_listing"."id",
"scratch_listing"."active",
"scratch_listing"."horse_id",
"scratch_horse"."id",
"scratch_horse"."name"
FROM "scratch_listing"
LEFT OUTER JOIN "scratch_horse"
ON ( "scratch_listing"."horse_id" = "scratch_horse"."id" )
WHERE "scratch_listing"."active" = True
This does a LEFT join, which means that the right hand side can contain NULL. The right hand side is Horse in this instance. This would perform very poorly if you had a lot of listings without a Horse, because it would bring back every single active listing, whether or not a horse was associated with it. You could fix that with .filter(active=True, horse__isnull=False) though.
See that I've used select_related, which joins the tables so that you're able to access listing.horse without incurring another query.
Now I should probably ask why all your fields are nullable. That's usually a terrible design choice, especially for ForeignKeys. Will you ever have a listing that's not associated with a horse? If not, get rid of the null. Will you ever have a horse that won't have a name? If not, get rid of the null.
So the answer is, do what seems natural most of the time. If you know a particular table is going to be large, then you must inspect the query planner (EXPLAIN), look into adding/using indexes on filter/join conditions, or querying from the other side of the relation.

Related

A better way of representing two extremely similar, yet different objects in Django?

so I'm trying to create a good way of modelling both "houses" and "house groups".
Houses and house groups are extremely similar in that they both carry a description and have related pricing information.
However, "bookings" can only be assigned to Houses and not to HouseGroups.
At the moment, my model looks like this:
class Houselike(models.Model):
max_guests = models.IntegerField()
name = models.CharField(max_length=20)
description = models.TextField(blank=True)
class House(Houselike):
pass
class HouseGroup(Houselike):
houses = models.ManyToManyField(House)
Semantically, this actually very close to what I want. However, in the database, this leads to there being two tables that both only have a single field "houselike_ptr_id" referring to the "Houselike" base object.
Checking whether a Houselike object is a House or a Housegroup thus involves looking in two different tables.
A more efficient alternative would be to do:
class Houselike(models.Model):
max_guests = models.IntegerField()
name = models.CharField(max_length=20)
description = models.TextField(blank=True)
is_group = models.BooleanField()
houses = models.ManyToManyField(House)
This results in only 1 extra field in the "houselike" table, and the other table containing the related houses is only hit if we actually look them up. This is the best solution from a storage point of view IMHO.
However, this isn't quite as good from a semantic point of view: Houses and Housegroups are similar, but different objects.
Also, this allows for stuff like housegroups containing other housegroups, non-groups containing houses, things I have to all check manually.
I also really like being able to explicitly work with House and HouseGroup objects. Representing them both with the same class just feels wrong.
Is there a better way to do this?
EDIT:
I forgot to mention that pricing information (as well as other entities) can be associated with either a House or a Housegroup, and is implemented (roughly) as follows:
class PricePeriod(models.Model):
house = models.ForeignKey(Houselike, on_delete=models.CASCADE)
arrival_date = models.DateField()
# Date of last departure date
departure_date = models.DateField()
price = models.DecimalField(max_digits = 10, decimal_places=2)
This is why I don't simply make the Houselike an abstract model, because these other objects are related to it.
Turns out, this is something called "single table inheritance", which is perfect in my case.
And, this being the Internet, there's an app for that: https://github.com/craigds/django-typed-models
from typedmodels.models import TypedModel
# Create your models here.
class Houselike(TypedModel):
max_guests = models.IntegerField()
name = models.CharField(max_length=20)
description = models.TextField(blank=True)
class House(Houselike):
pass
class HouseGroup(Houselike):
houses = models.ManyToManyField(House)
This resulted in pretty much exactly what I was asking: a single table in the database, and an explicit, semantically-correct model in Python/Django.
Now just I just need to fix my awful naming...

Many to Many Exclude on Multiple Objects

I have the following models:
class Deal(models.Model):
date = models.DateTimeField(auto_now_add=True)
retailer = models.ForeignKey(Retailer, related_name='deals')
description = models.CharField(max_length=255)
...etc
class CustomerProfile(models.Model):
saved_deals = models.ManyToManyField(Deal, related_name='saved_by_customers', null=True, blank=True)
dismissed_deals = models.ManyToManyField(Deal, related_name='dismissed_by_customers', null=True, blank=True)
What I want to do is retrieve deals for a customer, but I don't want to include deals that they have dismissed.
I'm having trouble wrapping my head around the many-to-many relationship and am having no luck figuring out how to do this query. I'm assuming I should use an exclude on Deal.objects() but all the examples I see for exclude are excluding one item, not what amounts to multiple items.
When I naively tried just:
deals = Deal.objects.exclude(customer.saved_deals).all()
I get the error: "'ManyRelatedManager' object is not iterable"
If I say:
deals = Deal.objects.exclude(customer.saved_deals.all()).all()
I get "Too many values to unpack" (though I feel I should note there are only 5 deals and 2 customers in the database right now)
We (our client) presumes that he/she will have thousands of customers and tens of thousands of deals in the future, so I'd like to stay performance oriented as best I can. If this setup is incorrect, I'd love to know a better way.
Also, I am running django 1.5 as this is deployed on App Engine (using CloudSQL)
Where am I going wrong?
Suggest you use customer.saved_deals to get the list of deal ids to exclude (use values_list to quickly convert to a flat list).
This should save you excluding by a field in a joined table.
deals = Deals.exclude( id__in=customer.saved_deals.values_list('id', flat=True) )
You'd want to change this:
deals = Deal.objects.exclude(customer.saved_deals).all()
To something like this:
deals = Deal.objects.exclude(customer__id__in=[1,2,etc..]).all()
Basically, customer is the many-to-many foreign key, so you can't use it directly with an exclude.
Deals saved and deals dismissed are two fields describing almost same thing. There is also a risk too much columns may be used in database if these two field are allowed to store Null values. It's worth to consider remove dismissed_deals at all, and use saved_deal only with True or False statement.
Another thing to think about is move saved_deals out of CustomerProfile class to Deals class. Saved_deals are about Deals so it can prefer to live in Deals class.
class Deal(models.Model):
saved = models.BooleandField()
...
A real deal would have been made by one customer / buyer rather then few. A real customer can have milions of deals, so relating deals to customer would be good way.
class Deal(models.Model):
saved = models.BooleanField()
customer = models.ForeignKey(CustomerProfile)
....
What I want to do is retrieve deals for a customer, but I don't want to include deals that they have dismissed.
deals_for_customer = Deals.objects.all().filter(customer__name = "John")
There is double underscore between customer and name (customer__name), which let to filter model_name (customer is related to CustomerProfile which is model name) and name of field in that model (assuming CutomerProfile class has name attribute)
deals_saved = deals_for_customer.filter(saved = True)
That's it. I hope I could help. Let me know if not.

Address DB schema - more ForeignKeys or just plain text?

I have a dilemma designing a database for my application. Basically, I want store US addresses. I'm using Django, but it's more of a database design question.
Say, I have models for State, City & ZipCode:
class State(models.Model):
short_name = models.CharField(_('state short name'), max_length=2, primary_key=True)
name = models.CharField(_('state full name'), max_length=50)
class City(models.Model):
name = models.CharField(_('city name'), max_length=100)
state = models.ForeignKey(State)
class ZipCode(models.Model):
code = models.CharField(_('zip code'), max_length=6)
city = models.ForeignKey(City)
Then, I want to store a single Address. Here is my dilemma: should I use Foreign Keys (or just a single one) or store the whole address as a CharFields? That is, should I use 1st, 2nd or 3rd version of Address model:
1st version:
class Address(models.Model):
street = models.CharField(_('street address'), max_length=300)
city = models.ForeignKey(City)
zip_code = models.ForeignKey(ZipCode)
state = models.ForeignKey(State)
counter = models.IntegerField()
2nd version:
class Address(models.Model):
street = models.CharField(_('street address'), max_length=300)
city = models.CharField(_('city'), max_length=300)
zip_code = models.CharField(_('zip code'), max_length=6)
state = models.CharField(_('state'), max_length=50)
counter = models.IntegerField()
3rd version:
class Address(models.Model):
street = models.CharField(_('street address'), max_length=300)
zip_code = models.ForeignKey(ZipCode)
counter = models.IntegerField()
My specific use case is that every user search will either generate new Address (if one doesn't exist) with counter = 0 or update existing Address (say, increment counter field; this is just an example). Assume 1 search per second with ~30% of redundant searches.
My notes of different versions:
1st:
overhead with creating new record (worst case: need to create new City & Zip; States will be already populated)
more connected data (not sure if that's a pro/con?)
2nd:
fast creation of new Address record
less "connected" data (not sure if that's a pro/con?)
3rd:
Zip_Code is already assigned to a City, which is already assigned to a State, no need to copy this data
I'm just not sure which schema is better and why. For now I've been using "plain" data, that is no Foreign Keys on the Address, just CharFields and it works ok. But my site is growing and I want to have a solid foundation. Also, I'm really curious how to approach such problem.
Thank you for taking the time to read this.
Thinking about it conceptually, does this hold true?
A state has one or more cities.
A city has one or more zip codes.
A zip code has one or more street addresses.
There's a fairly clear hierarchy here. If you reflect it in the database, then you'd have the following:
Address holding a foreign key to ZipCode.
ZipCode holding a foreign key to City.
City holding a foreign key to State.
So your design for State, City, and ZipCode look right; you should complete it by choosing Option 3.
Here are some benefits to this design:
You'll avoid update anomalies. You won't ever get into a situation where an Address holds/is related to a Zip Code from California while also holding/being related to the state of Wyoming.
You'll not be holding the string "Illinois" over and over again - aside from saving space, if you realise you accidentally typed "Ilinois" three years down the line, you won't need to carry out a huge update script on the Address table of your live database to correct the problem.
If a state border changed and a city which used to be a part of Arizona became part of New Mexico (OK, this is unlikely, but bear with me for the sake of sticking with your example!), you'd only have to update the foreign key on a single record in the City table.
If there's ever a different need for this same data (Reporting? Business intelligence/analytics? A new website feature?), having a solid structure like this with each data item held in only one place and without spurious foreign keys will make it clear which data to use, will help avoid the need for time consuming and potentially problematic data cleansing, and will reduce development time. Duplicated and inconsistent data in source systems takes up a huge amount of my time as a business intelligence/data warehousing developer.
You have the right idea in looking ahead and thinking about whether your current database design can stand up to your website's growth. The sooner you resolve issues like this, the easier they'll be to change and the less disruption you're likely to suffer.
If you're currently working with something more like Option 2, then I'm guessing you might well have used a similar pattern elsewhere in your database. If this is the case, and you'd like to avoid the issues I've mentioned above (and others), then it's really worth doing some reading or training on database design, and specifically how to carry out normalization.

What's the best way to ensure balanced transactions in a double-entry accounting app?

What's the best way to ensure that transactions are always balanced in double-entry accounting?
I'm creating a double-entry accounting app in Django. I have these models:
class Account(models.Model):
TYPE_CHOICES = (
('asset', 'Asset'),
('liability', 'Liability'),
('equity', 'Equity'),
('revenue', 'Revenue'),
('expense', 'Expense'),
)
num = models.IntegerField()
type = models.CharField(max_length=20, choices=TYPE_CHOICES, blank=False)
description = models.CharField(max_length=1000)
class Transaction(models.Model):
date = models.DateField()
description = models.CharField(max_length=1000)
notes = models.CharField(max_length=1000, blank=True)
class Entry(models.Model):
TYPE_CHOICES = (
('debit', 'Debit'),
('credit', 'Credit'),
)
transaction = models.ForeignKey(Transaction, related_name='entries')
type = models.CharField(max_length=10, choices=TYPE_CHOICES, blank=False)
account = models.ForeignKey(Account, related_name='entries')
amount = models.DecimalField(max_digits=11, decimal_places=2)
I'd like to enforce balanced transactions at the model level but there doesn't seem to be hooks in the right place. For example, Transaction.clean won't work because transactions get saved first, then entries are added due to the Entry.transaction ForeignKey.
I'd like balance checking to work within admin also. Currently, I use an EntryInlineFormSet with a clean method that checks balance in admin but this doesn't help when adding transactions from a script. I'm open to changing my models to make this easier.
(Hi Ryan! -- Steve Traugott)
It's been a while since you posted this, so I'm sure you're way past this puzzle. For others and posterity, I have to say yes, you need to be able to split transactions, and no, you don't want to take the naive approach and assume that transaction legs will always be in pairs, because they won't. You need to be able to do N-way splits, where N is any positive integer greater than 1. Ryan has the right structure here.
What Ryan calls Entry I usually call Leg, as in transaction leg, and I'm usually working with bare Python on top of some SQL database. I haven't used Django yet, but I'd be surprised (shocked) if Django doesn't support something like the following: Rather than use the native db row ID for transaction ID, I instead usually generate a unique transaction ID from some other source, store that in both the Transaction and Leg objects, do my final check to ensure debits and credits balance, and then commit both Transaction and Legs to the db in one SQL transaction.
Ryan, is that more or less what you wound up doing?
This may sound terribly naive, but why not just record each transaction in a single record containing "to account" and "from account" foreign keys that link to an accounts table instead of trying to create two records for each transaction? From my point of view, it seems that the essence of "double-entry" is that transactions always move money from one account to another. There is no advantage using two records to store such transactions and many disadvantages.

How can i get a list of objects from a postgresql view table to display

this is a model of the view table.
class QryDescChar(models.Model):
iid_id = models.IntegerField()
cid_id = models.IntegerField()
cs = models.CharField(max_length=10)
cid = models.IntegerField()
charname = models.CharField(max_length=50)
class Meta:
db_table = u'qry_desc_char'
this is the SQL i use to create the table
CREATE VIEW qry_desc_char as
SELECT
tbl_desc.iid_id,
tbl_desc.cid_id,
tbl_desc.cs,
tbl_char.cid,
tbl_char.charname
FROM tbl_desC,tbl_char
WHERE tbl_desc.cid_id = tbl_char.cid;
i dont know if i need a function in models or views or both. i want to get a list of objects from that database to display it. This might be easy but im new at Django and python so i having some problems
Django 1.1 brought in a new feature that you might find useful. You should be able to do something like:
class QryDescChar(models.Model):
iid_id = models.IntegerField()
cid_id = models.IntegerField()
cs = models.CharField(max_length=10)
cid = models.IntegerField()
charname = models.CharField(max_length=50)
class Meta:
db_table = u'qry_desc_char'
managed = False
The documentation for the managed Meta class option is here. A relevant quote:
If False, no database table creation
or deletion operations will be
performed for this model. This is
useful if the model represents an
existing table or a database view that
has been created by some other means.
This is the only difference when
managed is False. All other aspects of
model handling are exactly the same as
normal.
Once that is done, you should be able to use your model normally. To get a list of objects you'd do something like:
qry_desc_char_list = QryDescChar.objects.all()
To actually get the list into your template you might want to look at generic views, specifically the object_list view.
If your RDBMS lets you create writable views and the view you create has the exact structure than the table Django would create I guess that should work directly.
(This is an old question, but is an area that still trips people up and is still highly relevant to anyone using Django with a pre-existing, normalized schema.)
In your SELECT statement you will need to add a numeric "id" because Django expects one, even on an unmanaged model. You can use the row_number() window function to accomplish this if there isn't a guaranteed unique integer value on the row somewhere (and with views this is often the case).
In this case I'm using an ORDER BY clause with the window function, but you can do anything that's valid, and while you're at it you may as well use a clause that's useful to you in some way. Just make sure you do not try to use Django ORM dot references to relations because they look for the "id" column by default, and yours are fake.
Additionally I would consider renaming my output columns to something more meaningful if you're going to use it within an object. With those changes in place the query would look more like (of course, substitute your own terms for the "AS" clauses):
CREATE VIEW qry_desc_char as
SELECT
row_number() OVER (ORDER BY tbl_char.cid) AS id,
tbl_desc.iid_id AS iid_id,
tbl_desc.cid_id AS cid_id,
tbl_desc.cs AS a_better_name,
tbl_char.cid AS something_descriptive,
tbl_char.charname AS name
FROM tbl_desc,tbl_char
WHERE tbl_desc.cid_id = tbl_char.cid;
Once that is done, in Django your model could look like this:
class QryDescChar(models.Model):
iid_id = models.ForeignKey('WhateverIidIs', related_name='+',
db_column='iid_id', on_delete=models.DO_NOTHING)
cid_id = models.ForeignKey('WhateverCidIs', related_name='+',
db_column='cid_id', on_delete=models.DO_NOTHING)
a_better_name = models.CharField(max_length=10)
something_descriptive = models.IntegerField()
name = models.CharField(max_length=50)
class Meta:
managed = False
db_table = 'qry_desc_char'
You don't need the "_id" part on the end of the id column names, because you can declare the column name on the Django model with something more descriptive using the "db_column" argument as I did above (but here I only it to prevent Django from adding another "_id" to the end of cid_id and iid_id -- which added zero semantic value to your code). Also, note the "on_delete" argument. Django does its own thing when it comes to cascading deletes, and on an interesting data model you don't want this -- and when it comes to views you'll just get an error and an aborted transaction. Prior to Django 1.5 you have to patch it to make DO_NOTHING actually mean "do nothing" -- otherwise it will still try to (needlessly) query and collect all related objects before going through its delete cycle, and the query will fail, halting the entire operation.
Incidentally, I wrote an in-depth explanation of how to do this just the other day.
You are trying to fetch records from a view. This is not correct as a view does not map to a model, a table maps to a model.
You should use Django ORM to fetch QryDescChar objects. Please note that Django ORM will fetch them directly from the table. You can consult Django docs for extra() and select_related() methods which will allow you to fetch related data (data you want to get from the other table) in different ways.