ProgrammingError: when using order_by and distinct together in django - django

I have a model like below
class ProductScore(models.Model):
client = models.ForeignKey(User)
created = models.DateTimeField(default=datetime.datetime.now)
score = models.IntegerField()
scale = models.ForeignKey(Product)
As of now i am using the below query to filter out the duplicates from the above model
scores = ProductScore.objects.filter(client=request.user).distinct('scale')
By the above query it was returning the unique results but are old(created date was very old), i mean for example if the above table ProductScore has 10 duplicate records in which 5 of them are created yesterday and 5 of them are created today, the above query is returning 5 unique records which are created yesterday.
But i want the records which are created mostly recently(i.e., today) and unique so i tried like below
scores = ProductScore.objects.filter(client=request.user).order_by('created').distinct('scale')
which was not working and throwing some programming error exception
*** ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: SELECT DISTINCT ON ("product_productscore"."scale...
^
So how can i get the most recently created unique records form the above table ?

PostgreSQL is asking you to do this:
ProductScore.objects.filter(client=request.user).order_by('scale', '-created').distinct('scale')
...ordering by -created will give you the most recent of each duplicate, though your overall query results will be ordered by scale field

Related

How to get Cartesian product of two tables in Django Queryset?

Is there a way to do the equivalent of a full outer join in Django (I think I've read that full outer joins are not supported).
My scenario is that I have three tables:
Staff / WeekList / WeeksCompleted
The relevant fields I'm trying to work with are:
Staff table - Staff Number.
WeekList table - Week Start date.
WeeksCompleted table - Week Start date and Staff Number.
Basically, everyone should have an entry in the WeeksCompleted table (if they're still active, that is, but that's not pertinent for this question). The queryset I would like to produce is a list of Staff who have missing weeks in the WeeksCompleted table.
I can get the result I want using SQL queries but it involves a full outer join on the Staff and WeekList tables. I was wondering if anyone knows of a way to do this using the queryset functions?
The only other way I can think to do the equivalent of the full join is to create a list using a nested loop of Staff Numbers against each week, which might have a sizeable processing overhead?
EDIT: if it helps, here are the three simplified models.
models.py
class Staff(models.Model):
staff_number = models.CharField(max_length=9, null=True)
class WeekList(models.Model):
week_start = models.DateField(null=True)
class WeeksCompleted(models.Model):
staff = models.ForeignKey(to='weekscompleted.Staff', null=True, on_delete=models.PROTECT)
week_list = models.ForeignKey(to='weekscompleted.WeekList', null=True, on_delete=models.PROTECT)
EDIT 2: The join I think I need is:
SELECT staff_number, week_start
FROM Staff, Contractor
GROUP BY staff_number, week_start
This will give a list of the expected weeks completed for staff:
week_start staff_number
17/10/2020 12345
17/10/2020 54321
I can then compare this to the WeeksCompleted table:
week_start staff_number
17/10/2020 12345
to find which staff are missing for a week using this query (keep in mind that this is a query I produced in a database):
SELECT qryShouldBeCompleted.week_start, qryShouldBeCompleted.staff_number
FROM qryShouldBeCompleted
LEFT JOIN WeeksCompleted ON qryShouldBeCompleted.staff_number =
WeeksCompleted.staff_number
AND qryShouldBeCompleted.week_start = WeeksCompleted.week_start
WHERE WeeksCompleted.staff_number Is Null
This would then produce the result I need:
week_start staff_number
17/10/2020 54321
Edit 3:
I just found an article on FilteredRelation that gets me partway there:
Staff.objects.annotate(missing=FilteredRelation('weekscompleted', condition=Q(weekscompleted__week_start='some date'))).values('staff_number', 'missing__staff__staff_number', 'missing__week_start')
which gets me this:
{'staff_number': '54321', 'missing__staff__staff_number': None, 'missing__week_start': None}
The only thing with this is that it only appears to work for one week at a time - using __lte in the condition doesn't return any 'None' values so I'd have to loop through each week...

Django sum values of a column after 'group by' in another column

I found some solutions here and in the django documentation, but I could not manage to make one query work the way I wanted.
I have the following model:
class Inventory(models.Model):
blindid = models.CharField(max_length=20)
massug = models.IntegerField()
I want to count the number of Blind_ID and then sum the massug after they were grouped.
My currently Django ORM
samples = Inventory.objects.values('blindid', 'massug').annotate(aliquots=Count('blindid'), total=Sum('massug'))
It's not counting correctly (it shows only one), thus it 's not summing correctly. It seems it is only getting the first result... I tried to use Count('blindid', distinct=True) and Count('blindid', distinct=False) as well.
This is the query result using samples.query. Django is grouping by the two columns...
SELECT "inventory"."blindid", "inventory"."massug", COUNT("inventory"."blindid") AS "aliquots", SUM("inventory"."massug") AS "total" FROM "inventory" GROUP BY "inventory"."blindid", "inventory"."massug"
This should be the raw sql
SELECT blindid,
Count(blindid) AS aliquots,
Sum(massug) AS total
FROM inventory
GROUP BY blindid
Try this:
samples = Inventory.objects.values('blindid').annotate(aliquots=Count('blindid'), total=Sum('massug'))

Django query aggregate upvotes in backward relation

I have two models:
Base_Activity:
some fields
User_Activity:
user = models.ForeignKey(settings.AUTH_USER_MODEL)
activity = models.ForeignKey(Base_Activity)
rating = models.IntegerField(default=0) #Will be -1, 0, or 1
Now I want to query Base_Activity, and sort the items that have the most corresponding user activities with rating=1 on top. I want to do something like the query below, but the =1 part is obviously not working.
activities = Base_Activity.objects.all().annotate(
up_votes = Count('user_activity__rating'=1),
).order_by(
'up_votes'
)
How can I solve this?
You cannot use Count like that, as the error message says:
SyntaxError: keyword can't be an expression
The argument of Count must be a simple string, like user_activity__rating.
I think a good alternative can be to use Avg and Count together:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).order_by(
'-a', '-c'
)
The items with the most rating=1 activities should have the highest average, and among the users with the same average the ones with the most activities will be listed higher.
If you want to exclude items that have downvotes, make sure to add the appropriate filter or exclude operations after annotate, for example:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).filter(user_activity__rating__gt=0).order_by(
'-a', '-c'
)
UPDATE
To get all the items, ordered by their upvotes, disregarding downvotes, I think the only way is to use raw queries, like this:
from django.db import connection
sql = '''
SELECT o.id, SUM(v.rating > 0) s
FROM user_activity o
JOIN rating v ON o.id = v.user_activity_id
GROUP BY o.id ORDER BY s DESC
'''
cursor = connection.cursor()
result = cursor.execute(sql_select)
rows = result.fetchall()
Note: instead of hard-coding the table names of your models, get the table names from the models, for example if your model is called Rating, then you can get its table name with Rating._meta.db_table.
I tested this query on an sqlite3 database, I'm not sure the SUM expression there works in all DBMS. Btw I had a perfect Django site to test, where I also use upvotes and downvotes. I use a very similar model for counting upvotes and downvotes, but I order them by the sum value, stackoverflow style. The site is open-source, if you're interested.

Grouping Django model entries by day using its datetime field

I'm working with an Article like model that has a DateTimeField(auto_now_add=True) to capture the publication date (pub_date). This looks something like the following:
class Article(models.Model):
text = models.TextField()
pub_date = models.DateTimeField(auto_now_add=True)
I want to do a query that counts how many article posts or entries have been added per day. In other words, I want to query the entries and group them by day (and eventually month, hour, second, etc.). This would look something like the following in the SQLite shell:
select pub_date, count(id) from "myapp_article"
where id = 1
group by strftime("%d", pub_date)
;
Which returns something like:
2012-03-07 18:08:57.456761|5
2012-03-08 18:08:57.456761|9
2012-03-09 18:08:57.456761|1
I can't seem to figure out how to get that result from a Django QuerySet. I am aware of how to get a similar result using itertools.groupby, but that isn't possible in this situation (explanation to follow).
The end result of this query will be used in a graph showing the number of posts per day. I'm attempting to use the Django Chartit package to achieve this goal. Chartit puts a constraint on the data source (DataPool). The source must be a Model, Manager, or QuerySet, so using itertools.groupby is not an option as far as I can tell.
So the question is... How do I group or aggregate the entries by day and end up with a QuerySet object?
Create an extra field that only store date data(not time) and annotate with Count:
Article.objects.extra({'published':"date(pub_date)"}).values('published').annotate(count=Count('id'))
Result will be:
published,count
2012-03-07,5
2012-03-08,9
2012-03-09,1

What is the internal function in django to add new tables to a queryset in a sensible way?

In django 1.2:
I have a queryset with an extra parameter which refers to a table which is not currently included in the query django generates for this queryset.
If I add an order_by to the queryset which refers to the other table, django adds joins to the other table in the proper way and the extra works. But without the order_by, the extra parameter is failing. I could just add a useless secondary order_by to something in the other table, but I think there should be a better way to do it.
What is the django function to add joins in a sensible way? I know this must be getting called somewhere.
Here is some sample code. It selects all readings for a given user, and annotates the results with the rating (if any) given by another user stored in 'friend'.
class Book(models.Model):
name = models.CharField(max_length=200)
urlname = models.CharField(max_length=200)
entrydate=models.DateTimeField(auto_now_add=True)
class Reading(models.Model):
book=models.ForeignKey(Book,related_name='readings')
user=models.ForeignKey(User)
rating=models.IntegerField()
entrydate=models.DateTimeField(auto_now_add=True)
readings=Reading.objects.filter(user=user).order_by('entrydate')
friendrating='(select rating from proj_reading where user_id=%d and \
book_id=proj_book.id and rating in (1,2,3,4,5,6))'%friend.id
readings=readings.extra(select={'friendrating':friendrating})
at the moment, readings won't work because the join to readings is not set up correctly. however, if I add an order by such as:
.order_by('entrydate','reading__entrydate')
django magically knows to add an inner join through the foreign key and I get what I want.
additional information:
print readings.query ==>
select ((select rating from proj_reading where user_id=2 and book_id=proj_book.id and rating in (1,2,3,4,5,6)) as 'hisrating', proj_reading.id, proj_reading.user_id, proj_reading.rating, proj_reading.entrydate from proj_reading where proj_reading.user_id=1;
assuming
user.id=1
friend.id=2
the error is:
OperationalError: Unknown column proj_book.id in 'where clause'
and it happens because the table proj_book is not included in the query. To restate what I said above - if I now do readings2=readings.order_by('book__entrydate') I can see the proper join is set up and the query works.
Ideally I'd just like to figure out what the name of the qs.query function is that looks at two tables and figures out how they are joined by foreign keys, and just call that manually.
Your generated query:
select ((select rating from proj_reading where user_id=2 and book_id=proj_book.id and rating in (1,2,3,4,5,6)) as 'hisrating', proj_reading.id, proj_reading.user_id, proj_reading.rating, proj_reading.entrydate from proj_reading where proj_reading.user_id=1;
The db has no way to understand what does it mean by proj_book, since it is not included in (from tables or inner join).
You are getting expected results, when you add order_by, because that order_by query is adding inner join between proj_book and proj_reading.
As far as I understand, if you refer any other column in Book, not just order_by, you will get similar results.
Q1 = Reading.objects.filter(user=user).exclude(Book__name='') # Exclude forces to add JOIN
Q2 = "Select rating from proj_reading where user_id=%d" % user.id
Result = Q1.extra("foo":Q2)
This way, at step Q1, you are forcing DJango to add join on Book table, which is not default, unless you access any field of Book table.
you mean:
class SomeModel(models.Model)
id = models.IntegerField()
...
class SomeOtherModel(models.Model)
otherfield = models.ForeignKey(SomeModel)
qrst = SomeOtherModel.objects.filter(otherfield__id=1)
You can use "__" to create table joins.
EDIT:
It wont work because you do not define table join correctly.
myrating='(select rating from proj_reading inner join proj_book on (proj_book.id=proj_reading_id) where proj_reading.user_id=%d and rating in (1,2,3,4,5,6))'%user.id)'
This is a pesdocode and it is not tested.
But, i advice you to use django filters instead of writing sql queries.
read = Reading.objects.filter(book__urlname__icontains="smith", user_id=user.id, rating__in=(1,2,3,4,5,6)).values('rating')
Documentation for more details.