django query with filtered annotations from related table - django

Take books and authors models for example with books having one or more authors. Books having cover_type and authors having country as origin.
How can I list all the books with hard cover, and authors only if they're from from france?
Books.objects.filter(cover_type='hard', authors__origin='france')
This query doesnt retrieve books with hard cover but no french author.
I want all the books with hard cover, this is predicate #1.
And if their authors are from France, I want them annotated, otherwise authors field may be empty or 'None'.
e.g.:
`
Bookname, covertype, origin
The Trial, hardcover, none
Madam Bovary, hardcover, France
`
Tried many options, annotate, Q, value, subquery, when, case, exists but could come up with a solution.
With sql this is so easy:
select * from books b left join authors a on a.bookref=b.id and a.origin=france where b.covertype='hard'
(my models are not books and authors, i picked them because they're django-docs' example models. my models are building and buildingtype, where i want building.id=454523 with buildigtype where buildingtype is active, buildingtype might be null for the building or only 1 active and zero or more passive)

You should use Book id in Auther table.then your query will be like this: Author.objects.filter(origin="france",book__cover_type="hard")

I think i solved it with subquery, outerref, exists, case, when, charfield...too many imports for a simple sql.
`
author = Authors.objects.filter(bookref=OuterRef('id'), origin='France').values('origin')
books = Books.objects.filter(cover_type='hard').annotate(author=Case(When(Exists(author), then=Subquery(author)), default='none', output_field=CharField())).distinct().values('name','cover_type','author')
`

Related

How can I model this data in DynamoDB for a Library App

I have two entities, Books and Authors with a strict one-to-many relationship (many-to-many relationship not required for my use case)
The access patterns I want to satisfy are:
Get Author Info by Author Name
Get Book Info By just ISBN
Get all Books records by an Author using Author Name.
Do I need any GSI given the constraint that I can make only a single request to DB when adding a Book or an Author, and fulfill above three access patterns also with a single request?
If my Author Entity uses this key schema:
Partition Key: AUTHOR#XYZ
Sort Key: AUTHOR#XYZ
and for Book Entity I use
Partition Key: BOOK#123
Sort Key BOOK#123
I can get author info by name and book info by ISBN easily. How do I get the 3rd access pattern, entire book data by author name?
Two approaches I thought of:
Have a third entity in the table with PK AUTHOR#XYZ, SK BOOK#123, and use BEGINS_WITH(SK, 'BOOK') but in this approach, when adding a book to DB, I will have to write two items, PK BOOK#, SK BOOK# for getting book by just ISBN and PK AUTHOR#, SK BOOK# for getting all books by author, and the book info will be duplicated in both items.
Add an attribute GSIAuthorName to Book entity when adding a book, and create a GSI with PK GSIAuthorName (AUTHOR#XYZ) and SK being PK of Book entity (BOOK#123). But in this the issue is, in projections I will have to select ALL, since I want all book info attributes by author name, and need to fetch in single query to the GSI, so entire Book Entity will be duplicated in this GSI.
Is there an easier way to model this data?
Since you're trying to have two different access patterns for a single entity that require a different partition key value, there is basically only the two options you have identified correctly.
Your design seems to only work for books that have a single author. In the real world that's not sufficient. There are plenty of books with multiple authors such as "The Dictator's handbook" by Bruce Bueno de Mesquita and Alastair Smith - your data model might want to account for that. Author <-> Book isn't One-to-Many, it's Many-to-Many.
I'd go for something like this which uses a Global Secondary Index. It's very close to your second suggestion.
PK
SK
GSI1PK
GSI1SK
type
attributes
AUTHOR#ALASTAIR SMITH
AUTHOR#ALASTAIR SMITH
author
name, birthdate, ...
AUTHOR#BRUCE BUENO DE MESQUITA
AUTHOR#BRUCE BUENO DE MESQUITA
author
name, birthdate, ...
BOOK#978-1610391849
AUTHOR#ALASTAIR SMITH
AUTHOR#ALASTAIR SMITH
BOOK#978-1610391849
book
title, publisher, author,...
BOOK#978-1610391849
AUTHOR#BRUCE BUENO DE MESQUITA
AUTHOR#BRUCE BUENO DE MESQUITA
BOOK#978-1610391849
book
title, publisher, author,...
Does this introduce data duplication? - Yes
Does this introduce complexity on writes? - Yes
Does it work in the real world? - Yes
The model I've chose allows you to fulfill the requirements:
Get Author Info by Author Name: GetItem on the primary index with PK=AUTHOR#... and SK=Author#...
Get Book Info by just ISBN: Query on primary index with PK=BOOK#... and limit 1
Get all books for an Author: Query on GSI1 with PK=AUTHOR#
When you write a book, you need to add a book record for each author and potentially the author entries. For updates on a books info (which should be very rare) you first do the query as in 2) without the limit and then update each item that comes back.
Update
To address the requests for clarification in the comments:
If you require a strict One-to-Many relationship, I'd pick the second approach
Frequent writes are typically not a problem in your one-to-many case as long as you don't exceed the write throughput of a single partition, which is unlikely given the data. I don't see why you'd need frequent writes though.
The extra complexity is typically only a one-time penalty when you create your data access layer. The code for update_book_by_isbn will have to include the steps I outlined above and the create_book might store multiple records.

Django annotate Avg of foreign model

Two models, article and review, relationship is one to many (one article has many reviews). Some articles don't have any review.
I want to order articles by review ratings, therefore I use the annotate with AVG:
ArticleQueryset.annotate(rating=models.Avg('reviews__rating')).order_by('-rating')
The issue is that the articles without reviews the rating value is False and somehow that comes before the maximum rating. The result is that the first results don't have any rating, then the highest rated articles show up.
Use nulls_last=True in order_by() method as
ArticleQueryset.annotate(
rating=models.Avg('reviews__rating')
).order_by(models.F('rating').desc(nulls_last=True))

One to Many relationship real life example

I am trying to design the schema. I am confused about should I use one-to-many or many-to-one relationships.
My use case is somewhat like customers ordering the food.
There are 2 customers and 5 food items
Customers: [John, Alice]
Food: [Rice, Noodle, Chicken, Beacon, Ice-cream]
Use case: One Customer can order many items, but if first customer orders that item, it can not be ordered by other.
Example:
John orders -> Rice, Noodle, Chicken
Alice orders -> Beacon, Ice-cream
**This is valid, both customers ordered unique food.**
Example:
John orders -> Rice, Noodle, Chicken
Alice orders -> Beacon, Ice-cream, Chicken
**This is invalid, because Chicken is being ordered twice. John Already ordered chicken so Alice can not order it.**
Note: I am trying to this in mongodb documents and trying to establish relationship using Django models.
One way to handle this would be to create a junction table CustomerFood which looks something like this:
CREATE TABLE CustomerFood (
Customer varchar(255) NOT NULL,
Food varchar(255) NOT NULL,
PRIMARY KEY(Customer, Food)
);
The above table definition alone would only ensure that each customer can be related to each food at most once. To enforce the additional restriction that a given food can be associated with only one customer, we can add a unique constraint on the Food column:
ALTER TABLE CustomerFood ADD CONSTRAINT food_unique UNIQUE (Food);
Using Django templates:
You could use many to many in django (less code bit more complex to understand) OR create "table in the
middle approch" (more manual approach that needs more model code).
Django many to many documentation
Secondly you should use Validators to
ensure your logic that one person can only order one dish, and the
dishes will sell out, this is more programming logic and can be part
of a validator. Django validators documentation

How do I do a Django query on a Model, where I order by field A, but filter distinct on field B?

Suppose I have a Book table, where each book has an author field and a publishing date field.
I would like to get the latest book by each author. I'm using PostgreSQL as the backend.
The obvious (and wrong) solution would be:
Book.objects.order_by("author", "-published_on").distinct("author").all()
The problem is that while the result contains only one book from each author, then there is no guarantee that it is the latest book. This might be because I'm using random UUIDs as PKs. I can't change that. That's a requirement.
The next obvious (and wrong) solution would be:
Book.objects.order_by("author", "-published_on").distinct("author", "published_on").all()
Here the ordering of the books is correct, but we get multiple books from the same author.
I have also tried flipping around the arguments:
Book.objects.order_by("-published_on", "author").distinct("published_on", "author").all()
Here the ordering of the books is correct, but we get multiple books from the same author.
How do I do a Django ORM query, where I get the latest book from each author?
EDIT: Here's a query I'm actually running on our live DB, before translating it into the book-style example:
from db.models import User, EventVisibility
user = User.objects.get(username="7g8jltdzbz46ak7nhuz8tzfuu7y9mdym7tiy7klfxjnn")
evs = EventVisibility.objects.filter(user=user).order_by("room", "-created_on").distinct("room")[:20]
for ev in evs:
print(f"book_id={ev.room.room_id}, published_on={ev.created_on}")
And these are the results:
book_id=2mcnhajfwf5jsgyzpqix36ytbjfucn9u6derkyurlfff, published_on=2020-05-16 00:54:05.083477+00:00
book_id=4rp9ffxqr5marnphbtlahqtwnkzozupyb8ht532ffxl6, published_on=2020-05-12 20:29:31.286095+00:00
book_id=5dqygkksrzq6ay49xxcspagma5cbz8p59sjcavf6pepm, published_on=2020-05-08 09:28:53.508563+00:00
book_id=9mz85qcxreaczcnenebcywqqm3scehjhpwlkso7g4jbd, published_on=2020-05-04 10:52:06.396995+00:00
book_id=9sgiiasbvbtat4iahx7bd7ammzwatgfipe8wmzl9snz5, published_on=2020-05-15 09:00:52.602512+00:00
book_id=b8uvcxuhgjhmvkjjnwkcr5zzj7hrushz2e9mpzkosg8k, published_on=2020-05-08 09:36:47.148885+00:00
book_id=bxif8aal2v4fb3p8wsdvdard5p65ygw8j92tnleqqza4, published_on=2020-04-19 02:43:23.819854+00:00
book_id=cgoad7xuwjhxz6hcxctbl5arnnsrjt5osuwmzunmppra, published_on=2020-05-08 09:36:06.944614+00:00
book_id=cztb84akqqde6fvpj2nneqezvmor5gdjh3hpcjnxcz2x, published_on=2020-05-15 10:06:53.054862+00:00
book_id=czxizxptbvxz7jybkxevk2mkmaxykhgakfluud7ffa2b, published_on=2020-05-17 14:54:43.245325+00:00
book_id=dgtze2ri5snrr7nmurvdechydxjd2ph3dd8rugibn2me, published_on=2020-05-05 19:16:45.254928+00:00
book_id=dp9wu8qmdw6prsvx2zwvrnw5akcxv6llcwa2skeadcpx, published_on=2020-04-27 10:58:32.555542+00:00
book_id=duelfazwfiek8jhr4ew7wa9vrzzuyhznzxcrpybmbuww, published_on=2020-05-15 10:06:45.001961+00:00
book_id=dwhqxqfyolggdf5wwwm3su3yq6ffsh5kwwjxj7wtkdbj, published_on=2020-05-15 05:53:01.153492+00:00
book_id=edakxxhqv7w99lukxr23dfugcarddpwj5ea8wx7r5bmd, published_on=2020-04-27 19:49:29.673872+00:00
book_id=evz9biehu88eds7hgcutw6jfktt4fkjznfgozxsu8jtk, published_on=2020-04-20 21:13:01.693752+00:00
book_id=fqnxa3j4vbbaw7fc5hgrumabtfh2phmd3hg7cgm5ayfa, published_on=2020-05-15 10:04:22.322094+00:00
book_id=gkxahh8y7eqtqzxsnjtdpnghxnipi8vx3qugjcrs6t3m, published_on=2020-04-17 02:14:31.219950+00:00
book_id=hdgoxpnmqde8siwdbgfwwtodqk4hzhefyz8pw3esdmem, published_on=2020-05-17 14:46:49.437289+00:00
book_id=jrg6uae5kyvfvjgjhmwvzf45lbtqmgspawbuqzfewnhc, published_on=2020-05-05 09:11:59.334099+00:00
This is the queryset.query:
SELECT DISTINCT ON ("db_eventvisibility"."room_id") "db_eventvisibility"."id", "db_eventvisibility"."event_id", "db_eventvisibility"."user_id", "db_eventvisibility"."room_id", "db_eventvisibility"."unit_id", "db_eventvisibility"."case_id", "db_eventvisibility"."team_id", "db_eventvisibility"."created_on" FROM "db_eventvisibility" WHERE "db_eventvisibility"."user_id" = 7g8jltdzbz46ak7nhuz8tzfuu7y9mdym7tiy7klfxjnn ORDER BY "db_eventvisibility"."room_id" ASC, "db_eventvisibility"."created_on" DESC LIMIT 20
The problem is that while the result contains only one book from each author, then there is no guarantee that it is the latest book. This might be because I'm using random UUIDs as PKs. I can't change that. That's a requirement.
To the best of my knowledge, the result is correct in the sense that per Room, you get indeed the latest EventVisibility, but likely that is not what you want. If you want to sort the Rooms per latest EventVisibility, then you can do that with:
from django.db.models import Max
Room.objects.filter(
eventvisibility__user=user
).order_by(
Max('eventvisibitility__created_on').desc()
)

Django Queryset filter related field isnull lookup last object, Book (Loan - Return)

I'll show an example of a model, and discuss the issue here. Because it's hard to describe with a question.
# models.py
class Book(models.Model):
title = CharField(max_length=100)
class Loan(models.Model):
book = ForeignKey(Book)
class Return(models.Model):
loan = ForeignKey(Loan)
Book is available === Book is not loaned or Last loan has return the book
# Available:
Book.objects.filter(
Q(loan__isnull=True)| # Book has never been borrowed
Q(loan__return__isnull=False) # Book has been borrowed but returned
).distinct()
The above filter is partially correct.
The only problem happens when the Book has been returned and re-loaned again.
After the book being loaned, it should not be available, but it will return as available with the above queryset, because the loan__return__isnull=False is still exists for that particular book.
I couldn't figure out any better approach in such query. How could we make such simple query to work?
Possible Solution
The solution that I could think of is very ugly. It involves multiple separate queries. But roughly the step involves as follow:
querying last Loan group by the Book. (Book has been borrowed)
filter Loan which Return is null. (Book is not yet return)
Query Book, matching all of those Loan satisfying condition 2.
Query Book, which Loan is null. (Book never been borrowed)
Note (3 & 4 is combined together as one Query)
Book.objects.filter(
Q(loan__isnull=True) # Book has never been borrowed
| ( Q(loan__return__isnull=True) & Q(loan__isnull=False)) ) # Book has been borrowed but not returned
| Q(loan__return__isnull=False) # Book has been borrowed but returned
).distinct()
Different Approach to this Problem.
It solve my problem, but I could not prove that its correctness.
In this approach, I will use Count of both Loan and Return.
If the Count(Loan)==Count(Return), then Book is available
else Book is not available
Sample Code
# Available Books
Book.objects.annotate(
loan_count=Count('loan'),
return_count=Count('loan__return')
).filter(Q(issue_count=F('return_count')))
# Unavailable Books
Book.objects.annotate(
loan_count=Count('loan'),
return_count=Count('loan__return')
).exclude(Q(issue_count=F('return_count')))