Django: stop foreign key column on ManyToMany table from auto-ordering - django

I have a ManyToMany relationship between a Group model and a Source model:
class Group(models.Model):
source = models.ManyToManyField('Source', null=True)
class Source(models.Model):
content = models.CharField(max_length=8)
This creates an intermediate table with the columns : id (PK), group_id(FK) and source_id (FK)
Source could look like this:
+----+----------+
| id | content |
+----+----------+
| 1 | A |
| 2 | B |
| 3 | C |
+----+----------+
Each group can have different source member in different orders. For example, group 1 could have sources with 'content' C, A and B with keys of 3,1,2 respectively, and in that specific order.
Group 2 could have sources with 'content' B, C, A with keys of 2,3,1 respectively, and also in that specific order
the table should look like
+----+----------+---------------+
| id | group_id | source_id |
+----+----------+---------------+
| 1 | 1 | 3 |
| 2 | 1 | 1 |
| 3 | 1 | 2 |
| 4 | 2 | 2 |
| 5 | 2 | 3 |
| 6 | 2 | 1 |
+----+----------+---------------+
The trouble is when I associate these sources in the order I want in a code for loop
sequences = [['C', 'A', 'B'], ['B', 'C', 'A']]
for seq in sequences:
group = models.Group()
group.save()
for letter in seq:
source = models.Source.objects.get(content=letter)
source.group_set.add(group)
It ends up in the table as i.e. re-ordered sequentially in order which is definitely what I do not want as in this case the order of the Sources is essential.
+----+----------+---------------+
| id | group_id | source_id |
+----+----------+---------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 1 |
| 5 | 2 | 2 |
| 6 | 2 | 3 |
+----+----------+---------------+
How can I avoid this column re-ordering in Django?

It's important to understand that in SQL there isn't an inherent ordering to the table; the way the information is stored is opaque to you. Rather, the results of each query are ordered according to some specification that you provide at query time.
It sounds like you want the primary key of the M2M table to do double-duty as the field that defines the ordering. In most use cases that is a bad idea. What if you decide later to switch the order of A and B in group 1? What if you need to insert a new Source in between them? You can't do it, because primary keys are not that flexible.
The usual way to do this is to provide a specific column just for ordering. Unlike the primary key field you can change this at will, allowing you to adjust the order, insert new items, etc. In Django you would do this by explicitly declaring the M2M table (using the through field) and adding an ordering column to it. Something like:
class Group(models.Model):
source = models.ManyToManyField('Source', through='GroupSource')
class Source(models.Model):
content = models.CharField(max_length=8)
class GroupSource(models.Model):
# Also look into using unique_together for this model
group = models.ForeignKey(Group)
source = models.ForeignKey(Source)
position = models.IntegerField()
And your code would change to:
sequences = [['C', 'A', 'B'], ['B', 'C', 'A']]
for seq in sequences:
group = models.Group()
group.save()
for position, letter in enumerate(seq):
source = models.Source.objects.get(content=letter)
GroupSource.objects.create(group=group, source=source, position=position)

Thanks for taking the time and effort, and I probably would have gone down the route of doing much the same by adding another field to represent the ordering. But if you can safely get the same thing for free, why bother? These were individual inserts whose order of insertion is important. What puzzled me most later was some tests I have just concluded.
I managed to get the foreign keys still ordered the way I put them in by using sql-connector on a test db with the same schema relationships between the tables. There the keys in the intermediary table holding keys to each of the ManyToMany partners do not re-organise from lowest to highest. However, the exact same code unfortunately still did on the problematic database. Hence it was not a Django thing as such.
The only real difference between the functioning and non-functioning tables was the UNIQUE attribute pointing to the ManyToMany parters i.e foreign keys to Group and Source. After removing them, the problem went away.
However, to be honest, I am not sure why. Or why Django put those UNIQUE attributes there in the first place. Not sure either whether removing them will badly affect the application going forward.

Related

Howto do a LEFT JOIN in Django

Hello there (or as we say Moin Moin)!
I am new to django development (version 2.0) and do not understand how to do a LEFT JOIN in django-syntax.
For example I have the following models:
class Units(models.Model):
UnitID = models.AutoField(primary_key=True)
Description = models.CharField(max_length=30)
class MappingOperatorUnits(models.Model):
OperatorID = models.ForeignKey(User, on_delete=models.CASCADE)
UnitID = models.ForeignKey('Units', on_delete=models.CASCADE)
class Participants(models.Model):
UnitID = models.ForeignKey('Units', on_delete=models.CASCADE)
LessonID = models.ForeignKey('Lessons', on_delete=models.CASCADE)
OperatorID = models.ForeignKey(User, on_delete=models.PROTECT)
NumberParticipants = models.SmallIntegerField()
Now I am trying to do a query like
SELECT *
FROM MappingOperatorUnits
LEFT JOIN Units
ON ON MappingOperatorUnits.UnitID = Units.UnitID
LEFT JOIN Participants
ON MappingOperatorUnits.UnitID = Participants.UnitID
and what I want to be the result is something like this:
+-------------------+-----------------------+---------------------------------+
| Units.Description | Participants.LessonID | Participants.NumberParticipants |
+-------------------+-----------------------+---------------------------------+
| TeamA | 1 | 0 |
+-------------------+-----------------------+---------------------------------+
| TeamB | 1 | 3 |
+-------------------+-----------------------+---------------------------------+
| TeamC | NULL | NULL |
+-------------------+-----------------------+---------------------------------+
| TeamA | 2 | 2 |
+-------------------+-----------------------+---------------------------------+
| TeamB | 2 | 5 |
+-------------------+-----------------------+---------------------------------+
| TeamC | 2 | 1 |
+-------------------+-----------------------+---------------------------------+
I tried a lot of things in manage.py's shell but din't come to the solution. Anybody can help me to get it? Thank you!
You can annotate use .values(…) [Django-doc] to load values of a referenced model:
MappingOperatorUnits.objects.values(
'UnitID__Description',
'UnitID__participants__LessonID',
'UnitID__participants__NumberParticipants'
)
This produces a query that looks like:
SELECT units.Description,
participants.LessonID_id
participants.NumberParticipants
FROM mappingoperatorunits
INNER JOIN units ON mappingoperatorunits.UnitID_id = units.UnitID
LEFT OUTER JOIN participants ON units.UnitID = participants.UnitID_id
The INNER JOIN is an optimization that can be done here, since it is a non-NULLable ForeignKey, and thus we know there is always a related Units record.
Note: normally a Django model is given a singular name, so Unit instead of Units.
Note: Normally one does not add a suffix _id to a ForeignKey field, since Django
will automatically add a "twin" field with an _id suffix. Therefore it should
be unit, instead of UnitID.

update field in model using the same field with the function update() django

I have for many records to update a field, I must do it with the function .update() that already comes with the ORM of Django. I need to update this field concatenating a string with the value of the same field.
I have tried using annotate, with F expression and Value. But it didn't work, because in the annotation of a field I can't seem to use the same field.
This is what I tried to do:
Model.objects.all().annotate(image=Concat(Value("Path/"), F("image")))
I have the next model:
+------+-------+
| id | image |
+------+-------+
| 1 | image1|
| 2 | image2|
| 3 | image3|
When updating the model, suppose I want to concatenate the string "Path/" with field image, should be something like this
+------+------------+
| id | image |
+------+------------+
| 1 | Path/image1|
| 2 | Path/image2|
| 3 | Path/image3|
You need to look into the Django functions like Concat:
from django.db.models.functions import Concat
Model.objects.update(image=Concat(Value('Path/'), F('image')))
The following should work
Model.objects.update(image=Value('Path/') + F('image'))
F gives a reference to the previous value of the field

Efficient way of joining two query sets without foreign key

I know django doesn't allow joining without a foreign key relation and I can't specify a foreign key because there are entries in one table that are not in the other (populated using pyspark). I need an efficient way to query the following:
Let's say I have the following tables:
Company | Product | Total # Users | Total # Unique Users
and
Company | Product | # Licenses | # Estimated Users
I would like to join such that I can display a table like this on the frontend
Company View
Product|Total # Users|Total # Unique Users|#Licenses|# Estimated Users|
P1 | Num | Num | Num | Num |
P2 | Num | Num | Num | Num |
Currently loop through each product and perform a query (way too slow and inefficient) to populate a dictionary of lists
Way too inefficient
I'm not quite getting why you can't do a Foreign key in this situation, but if you can implement your query in a sql statement I would look at Q objects. See "Complex Lookups with Q Objects" in the documentation.
https://docs.djangoproject.com/en/2.2/topics/db/queries/#complex-lookups-with-q-objects

Query excluding duplicates in Django

I'm using distinct() QuerySet to get some data in Django.
My initial query was Point.objects.order_by('chron', 'pubdate').
The field chron in some cases is a duplicate so I changed the query
to Point.objects.order_by('chron', 'pubdate').distinct('chron') in order to exclude duplicates.
Now the problem is that all empty fields are considered duplicates.
To be accurate, the chron field contain integers (which behave similar to ids), in some cases it can be a duplicate, in some cases it can be NULL.
| chron |
|-------|
| 1 | I want this
| 2 | I want this
| 3 | I want this
| 3 |
| NULL |
| 4 | I want this
| NULL |
I want to exclude all the chron duplicates but not if they are duplicate of NULL.
Thank you.
Use two separate queries.
.distinct("chron").exclude(chron__isnull=True)
.filter() for only chron values where chron__isnull=True.
Although this seems pretty inefficient I believe (I will happily be corrected) that even any sensible vanilla SQL statement (eg. below) would require multiple table scans to join a result set of nulls and unique values.
SELECT *
FROM (
SELECT chron
FROM Point
WHERE chron IS NOT NULL # .exclude()
GROUP BY chron # .distinct()
UNION ALL
SELECT chron
FROM Point
WHERE chron IS NULL # .include()
)

Django: duplicates when filtering on many to many field

I've got the following models in my Django app:
class Book(models.Model):
name = models.CharField(max_length=100)
keywords = models.ManyToManyField('Keyword')
class Keyword(models.Model)
name = models.CharField(max_length=100)
I've got the following keywords saved:
science-fiction
fiction
history
science
astronomy
On my site a user can filter books by keyword, by visiting /keyword-slug/. The keyword_slug variable is passed to a function in my views, which filters Books by keyword as follows:
def get_books_by_keyword(keyword_slug):
books = Book.objects.all()
keywords = keyword_slug.split('-')
for k in keywords:
books = books.filter(keywords__name__icontains=k)
This works for the most part, however whenever I filter with a keyword that contains a string that appears more than once in the keywords table (e.g. science-fiction and fiction), then I get the same book appear more than once in the resulting QuerySet.
I know I can add distinct to only return unique books, but I'm wondering why I'm getting duplicates to begin with, and really want to understand why this works the way it does. Since I'm only calling filter() on successfully filtered QuerySets, how does the duplicate book get added to the results?
The 2 models in your example are represented with 3 tables: book, keyword and book_keyword relation table to manage M2M field.
When you use keywords__name in filter call Django is using SQL JOIN to merge all 3 tables. This allows you to filter objects in 1st table by values from another table.
The SQL will be like this:
SELECT `book`.`id`,
`book`.`name`
FROM `book`
INNER JOIN `book_keyword` ON (`book`.`id` = `book_keyword`.`book_id`)
INNER JOIN `keyword` ON (`book_keyword`.`keyword_id` = `keyword`.`id`)
WHERE (`keyword`.`name` LIKE %fiction%)
After JOIN your data looks like
| Book Table | Relation table | Keyword table |
|---------------------|------------------------------------|------------------------------|
| Book ID | Book name | relation_book_id | relation_key_id | Keyword ID | Keyword name |
|---------|-----------|------------------|-----------------|------------|-----------------|
| 1 | Book 1 | 1 | 1 | 1 | Science-fiction |
| 1 | Book 1 | 1 | 2 | 2 | Fiction |
| 2 | Book 2 | 2 | 2 | 2 | Fiction |
Then when data is loaded from DB into Python you only receive data from book table. As you can see the Book 1 is duplicated there
This is how Many-to-many relation and JOIN works
Direct quote from the Docs: https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Successive filter() calls further restrict the
set of objects, but for multi-valued relations, they apply to any
object linked to the primary model, not necessarily those objects that
were selected by an earlier filter() call.
In your case, because keywords is a multi-valued relation, your chain of .filter() calls filters based only on the original model and not on the previous queryset.