Django REST framework Group by fields and add extra contents

Django REST framework Group by fields and add extra contents - django

I have a Ticket booking model
class Movie(models.Model):
name = models.CharField(max_length=254, unique=True)
class Show(models.Model):
day = models.ForeignKey(Day)
time = models.TimeField(choices=CHOICE_TIME)
movie = models.ForeignKey(Movie)
class MovieTicket(models.Model):
show = models.ForeignKey(Show)
user = models.ForeignKey(User)
booked_at = models.DateTimeField(default=timezone.now)
I would like to filter MovieTicket with its user field and group them according to its show field, and order them by the recent booked time. And respond back with json data using Django REST framework like this:
[
{
show: 4,
movie: "Lion king",
time: "07:00 pm",
day: "23 Apr 2017",
total_tickets = 2
},
{
show: 7,
movie: "Gone girl",
time: "02:30 pm",
day: "23 Apr 2017",
total_tickets = 1
}
]
I tried this way:
>>> MovieTicket.objects.filter(user=23).order_by('-booked_at').values('show').annotate(total_tickets=Count('show'))
<QuerySet [{'total_tickets': 1, 'show': 4}, {'total_tickets': 1, 'show': 4}, {'total_tickets': 1, 'show': 7}]>
But its not grouping according to the show. Also how can I add other related fields (i.e., show__movie__name, show__day__date, show__time)

I explain it more generally on the graph of the database model. It can be applied to any "GROUP BY" with an extra contents.
+-------------------------+
| MovieTicket (booked_at) |
+-----+--------------+----+
| |
+---------+--------+ +--+---+
| Show (time) | | User |
++----------------++ +------+
| |
+------+-------+ +-----+------+
| Movie (name) | | Day (date) |
+--------------+ +------------+
The question is: How to summarize MovieTicket (the topmost object) grouped by Show (one related object) filtered by User (other related object) with reporting details from some related deeper objects (Movie and Day) and sorting these results by some field aggregated from the topmost model by the group (by the booked time of the recent MovieTicket in the group):
Answer explained by more general steps:
Start with the topmost model:
(MovieTicket.objects ...)
Apply filters:
.filter(user=user)
It is important to group by pk of the nearest related models (at least models those which are not made constant by the filter) - It is only "Show" (because "User" object is still filtered to one user)
.values('show_id')
Even if all other fields would be unique together (show__movie__name, show__day__date, show__time) it is better for the database engine optimizer to group the query by show_id because all these other fields depend on show_id and can not impact the number of groups.
Annotate necessary aggregation functions:
.annotate(total_tickets=Count('show'), last_booking=Max('booked_at'))
Add required dependent fields:
.values('show_id', 'show__movie__name', 'show__day__date', 'show__time')
Sort what is necessary:
.order_by('-last_booking') (descending from the latest to the oldest)
It is very important to not output or sort any field of the topmost model without encapsulating it by aggregation function. (Min and Max functions are good for sampling something from a group. Every field not encapsulated by aggregation would be added to "group by" list and that will break intended groups. More tickets to the same show for friend could be booked gradually but should be counted together and reported by the latest booking.)
Put it together:
from django.db.models import Max
qs = (MovieTicket.objects
.filter(user=user)
.values('show_id', 'show__movie__name', 'show__day__date', 'show__time')
.annotate(total_tickets=Count('show'), last_booking=Max('booked_at'))
.order_by('-last_booking')
)
The queryset can be easily converted to JSON how demonstrated zaphod100.10 in his answer, or directly for people not interested in django-rest framework this way:
from collections import OrderedDict
import json
print(json.dumps([
OrderedDict(
('show', x['show_id']),
('movie', x['show__movie__name']),
('time', x['show__time']), # add time formatting
('day': x['show__day__date']), # add date formatting
('total_tickets', x['total_tickets']),
# field 'last_booking' is unused
) for x in qs
]))
Verify the query:
>>> print(str(qs.query))
SELECT app_movieticket.show_id, app_movie.name, app_day.date, app_show.time,
COUNT(app_movieticket.show_id) AS total_tickets,
MAX(app_movieticket.booked_at) AS last_booking
FROM app_movieticket
INNER JOIN app_show ON (app_movieticket.show_id = app_show.id)
INNER JOIN app_movie ON (app_show.movie_id = app_movie.id)
INNER JOIN app_day ON (app_show.day_id = app_day.id)
WHERE app_movieticket.user_id = 23
GROUP BY app_movieticket.show_id, app_movie.name, app_day.date, app_show.time
ORDER BY last_booking DESC
Notes:
The graph of models is similar to ManyToMany relationship, but MovieTickets are individual objects and probably hold seat numbers.
It would be easy to get a similar report for more users by one query. The field 'user_id' and the name would be added to "values(...)".
The related model Day is not intuitive, but it is clear that is has a field date and hopefully also some non trivial fields, maybe important for scheduling shows with respect to events like cinema holidays. It would be useful to set the field 'date' as the primary key of Day model and spare a relationship lookup frequently in many queries like this.
(All important parts of this answer could be found in the oldest two answers: Todor and zaphod100.10. Unfortunately these answers have not been combined together and then not up-voted by anyone except me, even that the question has many up-votes.)

I would like to filter MovieTicket with its user field and group them
according to its show field, and order them by the recent booked time.
This queryset will give you exactly what you want:
tickets = (MovieTicket.objects
.filter(user=request.user)
.values('show')
.annotate(last_booking=Max('booked_at'))
.order_by('-last_booking')
)
And respond back with json data using Django rest framework like this:
[
{
show: 4,
movie: "Lion king",
time: "07:00 pm",
day: "23 Apr 2017",
total_tickets = 2
},
{
show: 7,
movie: "Gone girl",
time: "02:30 pm",
day: "23 Apr 2017",
total_tickets = 1
}
]
Well this json data is not the same as the query you described. You can add total_tickets by extending the annotation and show__movie__name into the .values clause: this will change the grouping to show+movie_name, but since show only has one movie_name it wont matter.
However, you cannot add show__day__date and show__time, because one show have multiple date-times, so which one would you want from a group? You could for example fetch the maximum day and time but this does not guarantee you that at this day+time there will be a show, because these are different fields, not related by each other. So the final attempt may look like:
tickets = (MovieTicket.objects
.filter(user=request.user)
.values('show', 'show__movie__name')
.annotate(
last_booking=Max('booked_at'),
total_tickets=Count('pk'),
last_day=Max('show__day'),
last_time=Max('show__time'),
)
.order_by('-last_booking')
)

You have to group by show and then count the total number of movie tickets.
MovieTicket.objects.filter(user=23).values('show').annotate(total_tickets=Count('show')).values('show', 'total_tickets', 'show__movie__name', 'show__time', 'show__day__date'))
Use this serilizer class for the above queryset. It will give the required json output.
class MySerializer(serializers.Serializer):
show = serailizer.IntegerField()
movie = serializer.StringField(source='show__movie__name')
time = serializer.TimeField(source='show__time')
day = serializer.DateField(source='show__day__date')
total_tickets = serializer.IntegerField()
It is not possible to order_by booked_at since that information gets lost when we group by show. If we order by booked_at group by will happen on unique booked_at times and show ids and that is why the ticket count was coming 1. Without order_by you will get correct count.
EDIT:
use this query:
queryset = (MovieTicket.objects.filter(user=23)
.order_by('booked_at').values('show')
.annotate(total_tickets=Count('show'))
.values('show', 'total_tickets', 'show__movie__name',
'show__time', 'show__day__date')))
You cannot annotate on an annotated field. So you will to find the total tickets count in python. To calculate total_tickets count for unique show ids:
tickets = {}
for obj in queryset:
if obj['show'] not in tickets.keys():
tickets[obj['show']] = obj
else:
tickets[obj['show']]['total_tickets'] += obj['total_tickets']
the final list of objects you need is tickets.values()
The same serializer above can be used with these objects.

You can try this.
Show.objects.filter(movieticket_sets__user=23).values('id').annotate(total_tickets=Count('movieticket_set__user')).values('movie__name', 'time', 'day').distinct()
OR
Show.objects.filter(movieticket_sets__user=23).values('id').annotate(total_tickets=Count('id')).values('movie__name', 'time', 'day').distinct()

Related

filtering DateField on the basis of substring in django

So I am using a DateField for displaying the date. Which when passed to a template using contexts, renders in the format Nov. 4, 2018
Now there are multiple entries of such dates in the database. And I want to filter on the basis of string of the date actually shown. i.e when I type in Nov 4, or nov 4 or NOV 4 in my search input field, it should show the matched result. More like a substring match.
Now the only problem is that i do not know how to convert my_model.date field in to Nov. 4, 2018.
str(my_model.date) returns 2016-11-04 and I do not want to parse this with month_number-to-month_name map
I think, any of the two solutions should work.
1) Django filters that allow me to do so.
2) converting my_model.date into Nov. 4, 2018 string
Help please, been stuck on this forever now

Because you specifically mention rendering the date in a template, I'm not sure if the search operation you're referring to is a front-end, user-facing thing or a backend database query.
If you want to convert my_model.date to a prettier string before sending it to the template before display, you can process it in your view with strptime - this will give you the control that you're missing with the str wrapper: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior While there are template tags that can do this for you as well, doing it in your view is faster and better.
If this is a DB query, then remember that you can filter date objects by their attributes. For example, to get all instances created on November 11th, 2011:
MyModel.objects.filter(
date__day = 11,
date__month = 11,
date__year = 2018
)
Note the double underscores.
Responding to your comment, now that I better understand your goal:
Date fields do not store their more "verbose" date information in a queryable spot. If you want to be able to query for things like "Nov" instead of 11 then you'll need to add another attribute (or attributes) to your model to propagate that computed data into queryable containers.
If it were me, I would do this within my_model:
from datetime import date
...
rawDate = models.DateField(...) #your current date field
formatted_month = models.CharField(...)
formatted_day = models.IntegerField(...)
formatted_year = models.IntegerField(...)
...
def save(self):
self.formatted_month = date.strftime(self.rawDate, '%b')
self.formatted_day = date.strftime(self.rawDate, '%d')
self.formatted_year = date.strftime(self.rawDate, '%Y')
super().save()
Now you can perform your NOV/nov/Nov lookup like so:
MyModel.objects.filter(
formatted_month__iexact = 'Nov'
)
This still requires you to split the month and day in your search term before hitting the database. If you wanted to squash these down a bit, you could instead store all of the formatted date info in a single field:
formatted_date = models.CharField(...)
...
def save(self):
self.formatted_date = date.strftime(self.rawDate, '%b %d %Y')
Then if your query looks like "NOV 4", you could do:
MyModel.objects.filter(formatted_date__icontains='NOV 4')

Elegant way of fetching multiple objects in custom order

What's an elegant way for fetching multiple objects in some custom order from a DB in django?
For example, suppose you have a few products, each with its name, and you want to fetch three of them to display in a row on your website page, in some fixed custom order. Suppose the names of the products which you want to display are, in order: ["Milk", "Chocolate", "Juice"]
One could do
unordered_products = Product.objects.filter(name__in=["Milk", "Chocolate", "Juice"])
products = [
unordered_products.filter(name="Milk")[0],
unordered_products.filter(name="Chocolate")[0],
unordered_products.filter(name="Juice")[0],
]
And the post-fetch ordering part could be improved to use a name-indexed dictionary instead:
ordered_product_names = ["Milk", "Chocolate", "Juice"]
products_by_name = dict((x.name, x) for x in unordered_products)
products = [products_by_name[name] for name in ordered_product_names]
But is there a more elegant way? e.g., convey the desired order to the DB layer somehow, or return the products grouped by their name (aggregation seems to be similar to what I want, but I want the actual objects, not statistics about them).

You can order your product by a custom order with only one query of your ORM (executing one SQL query only):
ordered_products = Product.objects.filter(
name__in=['Milk', 'Chocolate', 'Juice']
).annotate(
order=Case(
When(name='Milk', then=Value(0)),
When(name='Chocolate', then=Value(1)),
When(name='Juice', then=Value(2)),
output_field=IntegerField(),
)
).order_by('order')
Update
Note
Speaking about "elegant way" (and best practice) I think extra method (proposed by #Satendra) is absolutely to avoid.
Official Django documentation report this about extra :
Warning
You should be very careful whenever you use extra(). Every time you
use it, you should escape any parameters that the user can control by
using params in order to protect against SQL injection attacks .
Please read more about SQL injection protection.
Optimized version
If you want to handle more items whit only one query you can change my first query and use the Django ORM flexibility as suggested by #Shubhanshu in his answer:
products = ['Milk', 'Chocolate', 'Juice']
ordered_products = Product.objects.filter(
name__in=products
).order_by(Case(
*[When(name=n, then=i) for i, n in enumerate(products)],
output_field=IntegerField(),
))
The output of this command will be similar to this:
<QuerySet [<Product: Milk >, <Product: Chocolate>, <Product: Juice>]>
And the SQL generated by the ORM will be like this:
SELECT "id", "name"
FROM "products"
WHERE "name" IN ('Milk', 'Chocolate', 'Juice')
ORDER BY CASE
WHEN "name" = 'Milk' THEN 0
WHEN "name" = 'Chocolate' THEN 1
WHEN "name" = 'Juice' THEN 2
ELSE NULL
END ASC

When there is no relation between the objects that you are fetching and you still wish to fetch (or arrange) them in certain (custom) order, you may try doing this:
unordered_products = Product.objects.filter(name__in=["Milk", "Chocolate", "Juice"])
product_order = ["Milk", "Chocolate", "Juice"]
preserved = Case(*[When(name=name, then=pos) for pos, name in enumerate(product_order)])
ordered_products = unordered_products.order_by(preserved)
Hope it helps!

Try this into meta class from model:
class Meta:
ordering = ('name', 'related__name', )
this get your records ordered by your specified field's
then: chocolate, chocolate blue, chocolate white, juice green, juice XXX, milk, milky, milk YYYY should keep that order when you fetch

Creating a QuerySet from a list while preserving order
This means the order of output QuerySet will be same as the order of list used to filter it.
The solution is more or less same as #PaoloMelchiorre answer
But if there are more items lets say 1000 products in
product_names then you don't have to worry about adding more conditions in Case, you can use extra method of QuerySet
product_names = ["Milk", "Chocolate", "Juice", ...]
clauses = ' '.join(['WHEN name=%s THEN %s' % (name, i) for i, name in enumerate(product_names)])
ordering = 'CASE %s END' % clauses
queryset = Product.objects.filter(name__in=product_names).extra(
select={'ordering': ordering}, order_by=('ordering',))
# Output: <QuerySet [<Product: Milk >, <Product: Chocolate>, <Product: Juice>,...]>

Django ORM: django aggregate over filtered reverse relation

The question is remotely related to Django ORM: filter primary model based on chronological fields from related model, by further limiting the resulting queryset.
The models
Assuming we have the following models:
class Patient(models.Model)
name = models.CharField()
# other fields following
class MedicalFile(model.Model)
patient = models.ForeignKey(Patient, related_name='files')
issuing_date = models.DateField()
expiring_date = models.DateField()
diagnostic = models.CharField()
The query
I need to select all the files which are valid at a specified date, most likely from the past. The problem that I have here is that for every patient, there will be a small overlapping period where a patient will have 2 valid files. If we're querying for a date from that small timeframe, I need to select only the most recent file.
More to the point: consider patient John Doe. he will have string of "uninterrupted" files starting with 2012 like this:
+---+------------+-------------+
|ID |issuing_date|expiring_date|
+---+------------+-------------+
|1 |2012-03-06 |2013-03-06 |
+---+------------+-------------+
|2 |2013-03-04 |2014-03-04 |
+---+------------+-------------+
|3 |2014-03-04 |2015-03-04 |
+---+------------+-------------+
As one can easily observe, there is an overlap of couple of days of the validity of these files. For instance, in 2013-03-05 the files 1 and 2 are valid, but we're considering only file 2 (as the most recent one). I'm guessing that the use case isn't special: this is the case of managing subscriptions, where in order to have a continuous subscription, you will renew your subscription earlier.
Now, in my application I need to query historical data, e.g. give me all the files which where valid at 2013-03-05, considering only the "most recent" ones. I was able to solve this by using RawSQL, but I would like to have a solution without raw SQL. In the previous question, we were able to filter the "latest" file by aggregation over the reverse relation, something like:
qs = MedicalFile.objects.annotate(latest_file_date=Max('patient__files__issuing_date'))
qs = qs.filter(issuing_date=F('latest_file_date')).select_related('patient')
The problem is that we need to limit the range over which latest_file_date is computed, by filtering against 2013-03-05. But aggregate function don't run over filtered querysets ...
The "poor" solution
I'm currently doing this via an extra queryset clause (substitute "app" with your concrete application):
reference_date = datetime.date(year=2013, month=3, day=5)
annotation_latest_issuing_date = {
'latest_issuing_date': RawSQL('SELECT max(file.issuing_date) '
'FROM <app>_medicalfile file '
'WHERE file.person_id = <app>_medicalfile.person_id '
' AND file.issuing_date <= %s', (reference_date, ))
}
qs = MedicalFile.objects.filter(expiring_date__gt=reference_date, issuing_date__lte=reference_date)
qs = qs.extra(**annotation_latest_issuing_date).filter(issuing_date=F('latest_issuing_date'))
Writen as such, the queryset returns correct number of records.
Question: how can it be achieved without RaWSQL and (already implied) with the same performance level ?

You can use id__in and provide your nested filtered queryset (like all files that are valid at the given date).
qs = MedicalFile.objects
.filter(id__in=self.filter(expiring_date__gt=reference_date, issuing_date__lte=reference_date))
.order_by('patient__pk', '-issuing_date')
.distinct('patient__pk') # field_name parameter only supported by Postgres
The order_by groups the files by patient, with the latest issuing date first. distinct then retrieves that first file for each patient. However, general care is required when combining order_by and distinct: https://docs.djangoproject.com/en/1.9/ref/models/querysets/#django.db.models.query.QuerySet.distinct
Edit: Removed single patient dependence from first filter and changed latest to combination of order_by and distinct

Consider p is a Patient class instance.
I think you can do someting like:
p.files.filter(issue_date__lt='some_date', expiring_date__gt='some_date')
See https://docs.djangoproject.com/en/1.9/topics/db/queries/#backwards-related-objects
Or maybe with the Q magic query object...

django paginate with non-model object

I'm working on a side project using python and Django. It's a website that tracks the price of some product from some website, then show all the historical price of products.
So, I have this class in Django:
class Product(models.Model):
price = models.FloatField()
date = models.DateTimeField(auto_now = True)
name = models.CharField()
Then, in my views.py, because I want to display products in a table, like so:
+----------+--------+--------+--------+--------+....
| Name | Date 1 | Date 2 | Date 3 |... |....
+----------+--------+--------+--------+--------+....
| Product1 | 100.0 | 120.0 | 70.0 | ... |....
+----------+--------+--------+--------+--------+....
...
I'm using the following class for rendering:
class ProductView(objects):
name = ""
price_history = {}
So that in my template, I can easily convert each product_view object into one table row. I'm also passing through context a sorted list of all available dates, for the purpose of constructing the head of the table, and getting the price of each product on that date.
Then I have logic in views that converts one or more products into this ProductView object. The logic looks something like this:
def conversion():
result_dict = {}
all_products = Product.objects.all()
for product in all_products:
if product.name in result_dict:
result_dict[product.name].append(product)
else:
result_dict[product.name] = [product]
# So result_dict will be like
# {"Product1":[product, product], "Product2":[product],...}
product_views = []
for products in result_dict.values():
# Logic that converts list of Product into ProductView, which is simple.
# Then I'm returning the product_views, sorted based on the price on the
# latest date, None if not available.
return sorted(product_views,
key = lambda x: get_latest_price(latest_date, x),
reverse = True)
As per Daniel Roseman and zymud, adding get_latest_price:
def get_latest_price(date, product_view):
if date in product_view.price_history:
return product_view.price_history[date]
else:
return None
I omitted the logic to get the latest date in conversion. I have a separate table that only records each date I run my price-collecting script that adds new data to the table. So the logic of getting latest date is essentially get the date in OpenDate table with highest ID.
So, the question is, when product grows to a huge amount, how do I paginate that product_views list? e.g. if I want to see 10 products in my web application, how to tell Django to only get those rows out of DB?
I can't (or don't know how to) use django.core.paginator.Paginator, because to create that 10 rows I want, Django needs to select all rows related to that 10 product names. But to figure out which 10 names to select, it first need to get all objects, then figure out which ones have the highest price on the latest date.
It seems to me the only solution would be to add something between Django and DB, like a cache, to store that ProductView objects. but other than that, is there a way to directly paginate produvt_views list?

I'm wondering if this makes sense:
The basic idea is, since I'll need to sort all product_views by the price on the "latest" date, I'll do that bit in DB first, and only get the list of product names to make it "paginatable". Then, I'll do a second DB query, to get all the products that have those product names, then construct that many product_views. Does it make sense?
To clear it a little bit, here comes the code:
So instead of
#def conversion():
all_products = Product.objects.all()
I'm doing this:
#def conversion():
# This would get me the latest available date
latest_date = OpenDate.objects.order_by('-id')[:1]
top_ten_priced_product_names = Product.objects
.filter(date__in = latest_date)
.order_by('-price')
.values_list('name', flat = True)[:10]
all_products_that_i_need = Product.objects
.filter(name__in = top_ten_priced_product_names)
# then I can construct that list of product_views using
# all_products_that_i_need
Then for pages after the first, I can modify that [:10] to say [10:10] or [20:10].
This makes the code pagination easier, and by pulling appropriate code into a separate function, it's also possible to do Ajax and all those fancy stuff.
But, here comes a problem: this solution needs three DB calls for every single query. Right now I'm running everything on the same box, but still I want to reduce this overhead to two(One or Opendate, the other for Product).
Is there a better solution that solves both the pagination problem and with two DB calls?

Django query aggregate upvotes in backward relation

I have two models:
Base_Activity:
some fields
User_Activity:
user = models.ForeignKey(settings.AUTH_USER_MODEL)
activity = models.ForeignKey(Base_Activity)
rating = models.IntegerField(default=0) #Will be -1, 0, or 1
Now I want to query Base_Activity, and sort the items that have the most corresponding user activities with rating=1 on top. I want to do something like the query below, but the =1 part is obviously not working.
activities = Base_Activity.objects.all().annotate(
up_votes = Count('user_activity__rating'=1),
).order_by(
'up_votes'
)
How can I solve this?

You cannot use Count like that, as the error message says:
SyntaxError: keyword can't be an expression
The argument of Count must be a simple string, like user_activity__rating.
I think a good alternative can be to use Avg and Count together:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).order_by(
'-a', '-c'
)
The items with the most rating=1 activities should have the highest average, and among the users with the same average the ones with the most activities will be listed higher.
If you want to exclude items that have downvotes, make sure to add the appropriate filter or exclude operations after annotate, for example:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).filter(user_activity__rating__gt=0).order_by(
'-a', '-c'
)
UPDATE
To get all the items, ordered by their upvotes, disregarding downvotes, I think the only way is to use raw queries, like this:
from django.db import connection
sql = '''
SELECT o.id, SUM(v.rating > 0) s
FROM user_activity o
JOIN rating v ON o.id = v.user_activity_id
GROUP BY o.id ORDER BY s DESC
'''
cursor = connection.cursor()
result = cursor.execute(sql_select)
rows = result.fetchall()
Note: instead of hard-coding the table names of your models, get the table names from the models, for example if your model is called Rating, then you can get its table name with Rating._meta.db_table.
I tested this query on an sqlite3 database, I'm not sure the SUM expression there works in all DBMS. Btw I had a perfect Django site to test, where I also use upvotes and downvotes. I use a very similar model for counting upvotes and downvotes, but I order them by the sum value, stackoverflow style. The site is open-source, if you're interested.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django REST framework Group by fields and add extra contents - django

Related

filtering DateField on the basis of substring in django

Elegant way of fetching multiple objects in custom order

Django ORM: django aggregate over filtered reverse relation

django paginate with non-model object

Django query aggregate upvotes in backward relation

Categories

Resources