Django: duplicates when filtering on many to many field - django

I've got the following models in my Django app:
class Book(models.Model):
name = models.CharField(max_length=100)
keywords = models.ManyToManyField('Keyword')
class Keyword(models.Model)
name = models.CharField(max_length=100)
I've got the following keywords saved:
science-fiction
fiction
history
science
astronomy
On my site a user can filter books by keyword, by visiting /keyword-slug/. The keyword_slug variable is passed to a function in my views, which filters Books by keyword as follows:
def get_books_by_keyword(keyword_slug):
books = Book.objects.all()
keywords = keyword_slug.split('-')
for k in keywords:
books = books.filter(keywords__name__icontains=k)
This works for the most part, however whenever I filter with a keyword that contains a string that appears more than once in the keywords table (e.g. science-fiction and fiction), then I get the same book appear more than once in the resulting QuerySet.
I know I can add distinct to only return unique books, but I'm wondering why I'm getting duplicates to begin with, and really want to understand why this works the way it does. Since I'm only calling filter() on successfully filtered QuerySets, how does the duplicate book get added to the results?

The 2 models in your example are represented with 3 tables: book, keyword and book_keyword relation table to manage M2M field.
When you use keywords__name in filter call Django is using SQL JOIN to merge all 3 tables. This allows you to filter objects in 1st table by values from another table.
The SQL will be like this:
SELECT `book`.`id`,
`book`.`name`
FROM `book`
INNER JOIN `book_keyword` ON (`book`.`id` = `book_keyword`.`book_id`)
INNER JOIN `keyword` ON (`book_keyword`.`keyword_id` = `keyword`.`id`)
WHERE (`keyword`.`name` LIKE %fiction%)
After JOIN your data looks like
| Book Table | Relation table | Keyword table |
|---------------------|------------------------------------|------------------------------|
| Book ID | Book name | relation_book_id | relation_key_id | Keyword ID | Keyword name |
|---------|-----------|------------------|-----------------|------------|-----------------|
| 1 | Book 1 | 1 | 1 | 1 | Science-fiction |
| 1 | Book 1 | 1 | 2 | 2 | Fiction |
| 2 | Book 2 | 2 | 2 | 2 | Fiction |
Then when data is loaded from DB into Python you only receive data from book table. As you can see the Book 1 is duplicated there
This is how Many-to-many relation and JOIN works

Direct quote from the Docs: https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Successive filter() calls further restrict the
set of objects, but for multi-valued relations, they apply to any
object linked to the primary model, not necessarily those objects that
were selected by an earlier filter() call.
In your case, because keywords is a multi-valued relation, your chain of .filter() calls filters based only on the original model and not on the previous queryset.

Related

Efficient way of joining two query sets without foreign key

I know django doesn't allow joining without a foreign key relation and I can't specify a foreign key because there are entries in one table that are not in the other (populated using pyspark). I need an efficient way to query the following:
Let's say I have the following tables:
Company | Product | Total # Users | Total # Unique Users
and
Company | Product | # Licenses | # Estimated Users
I would like to join such that I can display a table like this on the frontend
Company View
Product|Total # Users|Total # Unique Users|#Licenses|# Estimated Users|
P1 | Num | Num | Num | Num |
P2 | Num | Num | Num | Num |
Currently loop through each product and perform a query (way too slow and inefficient) to populate a dictionary of lists
Way too inefficient
I'm not quite getting why you can't do a Foreign key in this situation, but if you can implement your query in a sql statement I would look at Q objects. See "Complex Lookups with Q Objects" in the documentation.
https://docs.djangoproject.com/en/2.2/topics/db/queries/#complex-lookups-with-q-objects

Django: stop foreign key column on ManyToMany table from auto-ordering

I have a ManyToMany relationship between a Group model and a Source model:
class Group(models.Model):
source = models.ManyToManyField('Source', null=True)
class Source(models.Model):
content = models.CharField(max_length=8)
This creates an intermediate table with the columns : id (PK), group_id(FK) and source_id (FK)
Source could look like this:
+----+----------+
| id | content |
+----+----------+
| 1 | A |
| 2 | B |
| 3 | C |
+----+----------+
Each group can have different source member in different orders. For example, group 1 could have sources with 'content' C, A and B with keys of 3,1,2 respectively, and in that specific order.
Group 2 could have sources with 'content' B, C, A with keys of 2,3,1 respectively, and also in that specific order
the table should look like
+----+----------+---------------+
| id | group_id | source_id |
+----+----------+---------------+
| 1 | 1 | 3 |
| 2 | 1 | 1 |
| 3 | 1 | 2 |
| 4 | 2 | 2 |
| 5 | 2 | 3 |
| 6 | 2 | 1 |
+----+----------+---------------+
The trouble is when I associate these sources in the order I want in a code for loop
sequences = [['C', 'A', 'B'], ['B', 'C', 'A']]
for seq in sequences:
group = models.Group()
group.save()
for letter in seq:
source = models.Source.objects.get(content=letter)
source.group_set.add(group)
It ends up in the table as i.e. re-ordered sequentially in order which is definitely what I do not want as in this case the order of the Sources is essential.
+----+----------+---------------+
| id | group_id | source_id |
+----+----------+---------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 1 |
| 5 | 2 | 2 |
| 6 | 2 | 3 |
+----+----------+---------------+
How can I avoid this column re-ordering in Django?
It's important to understand that in SQL there isn't an inherent ordering to the table; the way the information is stored is opaque to you. Rather, the results of each query are ordered according to some specification that you provide at query time.
It sounds like you want the primary key of the M2M table to do double-duty as the field that defines the ordering. In most use cases that is a bad idea. What if you decide later to switch the order of A and B in group 1? What if you need to insert a new Source in between them? You can't do it, because primary keys are not that flexible.
The usual way to do this is to provide a specific column just for ordering. Unlike the primary key field you can change this at will, allowing you to adjust the order, insert new items, etc. In Django you would do this by explicitly declaring the M2M table (using the through field) and adding an ordering column to it. Something like:
class Group(models.Model):
source = models.ManyToManyField('Source', through='GroupSource')
class Source(models.Model):
content = models.CharField(max_length=8)
class GroupSource(models.Model):
# Also look into using unique_together for this model
group = models.ForeignKey(Group)
source = models.ForeignKey(Source)
position = models.IntegerField()
And your code would change to:
sequences = [['C', 'A', 'B'], ['B', 'C', 'A']]
for seq in sequences:
group = models.Group()
group.save()
for position, letter in enumerate(seq):
source = models.Source.objects.get(content=letter)
GroupSource.objects.create(group=group, source=source, position=position)
Thanks for taking the time and effort, and I probably would have gone down the route of doing much the same by adding another field to represent the ordering. But if you can safely get the same thing for free, why bother? These were individual inserts whose order of insertion is important. What puzzled me most later was some tests I have just concluded.
I managed to get the foreign keys still ordered the way I put them in by using sql-connector on a test db with the same schema relationships between the tables. There the keys in the intermediary table holding keys to each of the ManyToMany partners do not re-organise from lowest to highest. However, the exact same code unfortunately still did on the problematic database. Hence it was not a Django thing as such.
The only real difference between the functioning and non-functioning tables was the UNIQUE attribute pointing to the ManyToMany parters i.e foreign keys to Group and Source. After removing them, the problem went away.
However, to be honest, I am not sure why. Or why Django put those UNIQUE attributes there in the first place. Not sure either whether removing them will badly affect the application going forward.

Need Modeling Help For An Ordering Form

I'd like to create a Django project for my company's purchasing department. This would be my first project in Django, so sorry if this comes off as rudimentary. The workflow would look something like this:
user registers for an account > signs in > can create, edit, view, or delete a purchase order.
I'm getting tripped up on the modeling. Presumably I can create and authenticate users using django.contrib.auth. Also, since this is mainly a form saving/printing application I would use a ModelForm to generate my forms based on my models since the users will be making changes to the form data that will need to be saved. A simplified version of the purchase order form in question looks something like this:
| Vendor | Date | Lead Time | Arrival Date | Buyer_Name |
+--------+-------+-----------+--------------+------------+
| FooBar |1-1-12 | 30 | 2-1-12 | Mr. Bar |
+--------+-------+-----------+--------------+------------+
+--------+-------+-----------+--------------+------------+
| SKU | Description | Quantity | Price | Dimensions |
+--------+-------------+----------+-------+--------------+
|12345 | Soft Bar | 38 | 5.75 | 16 X 5 X 8 |
+--------+-------------+----------+-------+--------------+
|12346 | Hard Bar | 12 | 5.75 | 16 X 5 X 8 |
+--------+-------------+----------+-------+--------------+
|12347 | Medium Bar | 17 | 5.75 | 16 X 5 X 8 |
+--------+-------------+----------+-------+--------------+
As you can see, the main purchase order form has a header that identifies the Vendor being ordered from, the current date, lead time, arrival date, and the buyer's name who is filling the form out. Under that is a line-by-line order detail for three different SKUs. Ideally, each PurchaseOrder should be able to have many SKUs added to it.
What is the best way to model something like this? Do I create a User, PurchaseOrder, and SKU model? Then add a FK to the SKU Model that points to the PurchaseOrder Model's PK or is there some other, more correct, way to do something like this? Thanks in advance for any help.
[Edit]
Django had what I was looking for all along. Since this is essentially a nested form, I could make use of Formsets.
Here are two helpful links to get started:
https://docs.djangoproject.com/en/1.4/topics/forms/formsets/
https://docs.djangoproject.com/en/1.4/topics/forms/modelforms/#model-formsets
Use django's built in user model (you can look at the source to see the definition but it is similar to the code below for these other models). Other than that I would suggest a model for every object you mentioned.
Don't add a FK to the SKU Model since SKU can exist without being in a purchase order (if I understand the problem correctly).
models.py
from django.contrib.auth.models import User
class Vendor(models.Model):
name = models.CharField(max_length=200)
#other fields
class SKU(models.Model):
description = models.CharField(max_length=200)
#other fields
class PurchaseOrder(models.Model):
purchaser = models.ForiegnKey(User)
name = models.CharField(max_length=200)
skus = models.ManyToManyField(SKU) #this is the magic that allows 1 purchase order to be filled with several SKUs
#other fields

Ordering entries via comment count with django

I need to get entries from database with counts of comments. Can i do it with django's comment framework? I am also using a voting application which is not using GenericForeignKeys i get entries with scores like this:
class EntryManager(models.ModelManager):
def get_queryset(self):
return super(EntryManager,self).get_queryset(self).all().annotate(\
score=Sum("linkvote__value"))
But when there is foreignkeys i am being stuck. Do you have any ideas about that?
extra explaination: i need to fetch entries like this:
id | body | vote_score | comment_score |
1 | foo | 13 | 4 |
2 | bar | 4 | 1 |
after doing that, i can order them via comment_score. :)
Thans for all replies.
Apparently, annotating with reverse generic relations (or extra filters, in general) is still an open ticket (see also the corresponding documentation). Until this is resolved, I would suggest using raw SQL in an extra query, like this:
return super(EntryManager,self).get_queryset(self).all().annotate(\
vote_score=Sum("linkvote__value")).extra(select={
'comment_score': """SELECT COUNT(*) FROM comments_comment
WHERE comments_comment.object_pk = yourapp_entry.id
AND comments_comment.content_type = %s"""
}, select_params=(entry_type,))
Of course, you have to fill in the correct table names. Furthermore, entry_type is a "constant" that can be set outside your lookup function (see ContentTypeManager):
from django.contrib.contenttypes.models import ContentType
entry_type = ContentType.objects.get_for_model(Entry)
This is assuming you have a single model Entry that you want to calculate your scores on. Otherwise, things would get slightly more complicated: you would need a sub-query to fetch the content type id for the type of each annotated object.

Django QuerySet result with null fields in results

I have to get a QuerySet with many-to-many relations with the same number of results as if I executed the query in the database, but can't manage how to do this; I don't care if I can get the results as a QuerySet item or as a values item, but I do care to get the same number of results.
Imagine the following scenario:
class Person(models.Model):
name = models.CharField()
class Car(models.Model):
name = models.CharField()
class House(models.Model):
people = models.ManyToMany(Person)
cars = models.ManyToMany(Car)
house_1 = House.objects.create()
house_2 = House.objects.create()
john = Person.objects.create(name='John')
mary = Person.objects.create(name='Mary')
house_1.people.add(john)
house_1.people.add(mary)
mike = Person.objects.create(name='Mike')
ferrari = Car.objects.create(name='Ferrari')
house_2.people.add(mike)
house_2.cars.add(ferrari)
'''
Expected search result, regardless of the result format (model instances or values):
------------------------------------
| House ID | Car | Person |
| 1 | | John |
| 1 | | Mary |
| 2 | Ferrari | Mike |
------------------------------------
'''
How can I get a list of values, with all 3 results, spanning multiple tables, as here?
I need this so that I can create a report which can potentialy contain null fields, so the duplicated results must be listed.
Thanks!
Try to write SQL query that does that. You can't because it's wrong query to that data structure. Imagine that there will be 2 cars assigned to house 1. Should it be 1-[car-1]-John, 1-[car-2]-Merry or 1-[car-2]-John, 1-[car-1]-Merry?