Relation (x > 1)-to-many - django

It's probably more a generic question about database, than a django one but let's go.
I have very often what could be considered as 2-to-many relations.
For example, I have a class Match in my project, which is an encounter between two teams.
At first, I was using many-to-many:
Match(Model):
teams = ManyToManyField('Team', related_name='matches') #Always two teams
It ended up being very inefficient for match related pages. Especially in django 1.3, because there is no equivalent to select_related for many_to_many relation. And it's a bit painful having to iterate when you now there is exactly two elements.
Then I switched to this model :
Match(Model):
teams = ManyToManyField('Team', related_name='matches') #Always two teams
team1 = ForeignKey('Team')
team2 = ForeignKey('Team')
When I want to display match related pages, I can use select_related and display very efficiently the two teams.
And when I'm on the team page, I can follow the "matches" relation to get all the matches as before.
But I find that 3 fields to deal with one relation is totally horrible.
Am I doing it right ? What would you recommend ?

I see few solutions there.
1.
If you're not anticipating that in future a match will be played between other number of teams than two it may be best for you to just remove a ManyToMany field from your second design:
Match(Model):
team1 = ForeignKey('Team')
team2 = ForeignKey('Team')
It'll remove a redundation in a database and makes Match modifications easier (you don't have to worry about cohesion between team field and corresponding team1, team2 fields). It also makes impossible to create a Match with other number of corresponding teams than two, which leads to database consistency.
On the other hand you're losing a flexiblity and it's harder to write queries, because of two fields to check instead of one.
2.
You may want to use a cache. In fact - those two fields you have are nothing other than that, but moving it to a dedicated cache database can make it more efficient and separates your main database from cached values, which leads to a clearer database design.
3.
Write your own ManyToMany fetcher. It's not very hard to do, but rather ugly. You have to select teams you're interested in and then somehow attach them to Match objects, e.g.:
matches # Match QuerySet
related_teams = Team.objects.filter(match__in=matches)
matches_map = {}
for t in related_teams:
matches_map[t.match_id]= matches_map.get(t.match_id, []).append(t)
for m in matches:
m.fetched_teams = matches_map[m.pk]
3.5.
New version of Django will have a prefetch_related() method which is meant to work with ManyToMany queries, but I suppose you don't have so much time.

Related

django restframework: how to efficiently search on many to many related fields?

Given these models:
class B:
my_field = TextField()
class A:
b = ManyToMany(B)
I have +50K rows in A, when searching for elements I want to do full text searches on my_field by traversing the many to many field b (i.e. b__my_field).
This works fine when the number of many to many elements Bper A object is less than ~3. How ever if I have something greater than that performance drops impressively.
Wondering if I could do some sort of prefetch related search? Is something like haystack my only option?
When you loop through a query set, django makes a database request for each step of your loop. See this for exampleon ORM pitfalls. A thing that you should learn when using django ORM is to use commands to avoid database roundtrips as much as possible. One way to do that is with values() function. Ideally you should get only what you need too.
Try this:
l = list(A.b.all().values('my_field'))
This guarantees only one database query, and return a list that you can loop through in python speed. Should be much faster.

Django, multi-table inheritance is that bad?

This isn't really specific to django.
One can model
Place (with location, name, and other common attributes)
- Restaurant (menu..)
- ConcertHall (hall size..)
in two separate tables and let each one hold all the fields they need. (in django world, this is called abstract inheritance)
in three tables, where one holds the common fields and the other two has their own unique fields. (multi-table inheritance in django)
The authors of book Two scoops of Django 1.8 strongly advise against using multi-table inheritance.
Say you want to query places based on it's location and paginate the results (It doesn't have to be a location, can be any other common attribute we want to filter on)
I can see how I can achieve it using Multi-table inheritance.
select place.id from place LEFT OUTER JOIN "restaurant" on (
restuarant.id=place.id) LEFT OUTER JOIN "concerthall" on (
concerthall.id=place.id) where ... order by distance
Is it feasible to do it with abstract inheritance?
According to Django documentation: Model inheritance:
The only decision you have to make is whether you want the parent models to be models in their own right (with their own database tables), or if the parents are just holders of common information that will only be visible through the child models.
I think both possibilities are just tools, equally good tools and it just depends on your use case for their appropriateness. Surely there are specific things to consider for both approaches, and conceptually sometimes multi-table inheritance may be more difficult to comprehend, but other than that this topic just turns to become opinionated.
If you need a single queryset for both models, then it is logical that you consider multi-table inheritance rather than abstract models, because otherwise you would need to get into combining two querysets into one, most probably by using lists as this relevant answer suggests, but you would definitely lose ORM functionality.
It depends on your usecases, but Django ihave a good Database ORM for Database Normalized table structure.
Keeping the base fields in a model and keeping the specifics on another is the best approach in Database Normalization logic because you may have query on different tables and that is not a desired situation. Django relations and reverse relations offers you what you need at this point.
An Example based on yours considering you are using Multi Table Inheritance:
Place.objects.filter(location=x)
Place.objects.filter(location=x, Q(Q(concerthall__hallsize__gt=y)| Q(restaurant__menu=z)))
Place.objects.filter(location=x, concerthall__id__isnull=True)
First will return you all Restaurants and Concert Halls in x.
Second will return you All places which are Concert Halls with hall sizes greater than y or Restaurants with menu z.
Last one is a super magic query that will return you all places in location x which is not a Concert Hall. That is useful when you have many Models inheriting from Place. You can use <model_name>__id for including/excluding tables according to your needs.
You can built great JOINS including many tables and do stick to Database Normalization rules while doing this. You will keep your related data in one place and avoid possible data integrity problems.

Filter on a list of tags

I'm trying to select all the songs in my Django database whose tag is any of those in a given list. There is a Song model, a Tag model, and a SongTag model (for the many to many relationship).
This is my attempt:
taglist = ["cool", "great"]
tags = Tag.objects.filter(name__in=taglist).values_list('id', flat=True)
song_tags = SongTag.objects.filter(tag__in=list(tags))
At this point I'm getting an error:
DatabaseError: MultiQuery does not support keys_only.
What am I getting wrong? If you can suggest a completely different approach to the problem, it would be more than welcome too!
EDIT: I should have mentioned I'm using Django on Google AppEngine with django-nonrel
You shouldn't use m2m relationship with AppEngine. NoSQL databases (and BigTable is one of them) generally don't support JOINs, and programmer is supposed to denormalize the data structure. This is a deliberate design desicion: while your database will contain redundant data, your read queries will be much simpler (no need to combine data from 3 tables), which in turn makes the design of DB server much simpler as well (of course this is made for the sake of optimization and scaling)
In your case you should probably get rid of Tag and SongTag models, and just store the tag in the Song model as a string. I of course assume that Tag model only contains id and name, if Tag in fact contains more data, you should still have Tag model. Song model in that case should contain both tag_id and tag_name. The idea, as I explained above, is to introduce redundancy for the sake of simpler queries
Please, please let the ORM build the query for you:
song_tags = SongTag.objects.filter(tag__name__in = taglist)
You should try to use only one query, so that Django also generates only one query using a join.
Something like this should work:
Song.objects.filter(tags__name__in=taglist)
You may need to change some names from this example (most likely the tags in tags__name__in), see https://docs.djangoproject.com/en/1.3/ref/models/relations/.

What are the advantages of using ForeignKey in Django?

This is an extremely naive question. As you can tell, it comes from someone who doesn't know much about either databases or Django.
What are the advantages of using ForeignKeys in Django?
Maybe an example will help me understand better. I have tables like this already:
City:
id = IntegerField() # e.g. 15
name = CharField() # e.g. 'Rome'
Country:
name = CharField() e.g. 'Italy'
capital = IntegerField() # e.g 15
Should I bother changing capital to ForeignKey(City), and if so, why? Do certain things become quicker, more convenient, or otherwise better?
thanks!
Foreign keys are constraints on your data model that allow you to ensure consistency. Basically, in your example, if you didn't have capital as a ForeignKey to a City it could potentially contain an id of a City that didn't exist! When you use ForeignKey instead, it places a constraint on the database so that you cannot remove things that are currently referenced by other things. So if you tried to delete the City named "Rome" before deleting the Country named "Italy" who has that city as its capital, it wouldn't allow you to.
Using ForeignKey instead would make sure you never had to worry about whether or not the "thing" on the other end of the relationship was still there or not.
Using ForeignKeys enables Django to "know" about the relations. So you can follow the relations without taking care about how they are stored, eg you can do the following without having to care how the relation is stored:
country = Country.objects.get(pk=1)
print country.capital.name
italy = Country.objects.get(capital__name="Rome")
Also for keeping constraints Django will follow the relations when deleting objects that are referenced by a ForeignKey. Django also keeps track of the reverse relationships (without needing to explicitly define them), so you can do something like countries = rome.country_set.all() (which makes not so much sense in this example, since it would make more sense to use a OneToOneField here...
Referential integrity. OTOH it is quite possible to have a database that neither knows nor cares about FKs - in fact I work with a legacy db like this at work.

is distinct an expensive query in django?

I have three models: Product, Category and Place.
Product has ManyToMany relation with Category and Place.
I need to get a list of categories with at least on product matching a specific place.
For example I might need to get all the categories that has at least one product from Boston.
I have 100 categories, 500 places and 100,000 products.
In sqlite with 10K products the query takes ~ a second.
In production I'll use postgresql.
I'm using:
categories = Category.objects.distinct().filter(product__place__name="Boston")
Is this query going to be expensive?
Is there a better way to do this?
This is the result of connection.queries
{'time': '0.929', 'sql': u'SELECT DISTINCT "catalog_category"."id", "catalog_category"."name" FROM "catalog_category" INNER JOIN "catalog_product_categories" ON ("catalog_category"."id" = "catalog_product_categories"."category_id") INNER JOIN "catalog_product" ON ("catalog_product_categories"."product_id" = "catalog_product"."id") INNER JOIN "catalog_product_places" ON ("catalog_product"."id" = "catalog_product_places"."product_id") INNER JOIN "catalog_place" ON ("catalog_product_places"."car_id" = "catalog_car"."id") WHERE "catalog_place"."name" = Boston ORDER BY "catalog_category"."name" ASC'}]
Thanks
This is not just a Django issue; DISTINCT is slow on most SQL implementations because it's a relatively hard operation. Here is a good discussion of why it's slow in Postgres specifically.
One way to handle this would be to use Django's excellent caching mechanism on this query, assuming that the results don't change often and minor staleness isn't a problem. Another approach would be to keep a separate list of just the distinct categories, perhaps in another table.
Although Chase is right that DISTINCT is generally a slow operation, in this case it is also completely pointless. As you can see from the generated SQL, the DISTINCT is being done on the combination of ID and name - which will never be duplicated anyway. So there is no need for the distinct() call in this query.
Generally, Django does not return duplicate results from a simple filter. The main time when distinct() is useful is when you are accessing a related queryset via a ManyToMany or ForeignKey relationship, where multiple items might be related to the same instance, and distinct will remove the duplicates.