How to delete duplicate rows from a table using Django ORM?

How to delete duplicate rows from a table using Django ORM? - django

I have a database table bad_reviews and a corresponding Django model BadReviews. I want to delete duplicate records based on the fields client_id, survey_id, text, rating, privacy_agreement. I've come up with this query which works:
SELECT br.*
FROM bad_reviews br
JOIN (
SELECT client_id, survey_id, text, rating, privacy_agreement, COUNT(*)
FROM bad_reviews
GROUP BY client_id, survey_id, text, rating, privacy_agreement
HAVING count(*) > 1
) dupes
ON br.client_id = dupes.client_id
AND br.survey_id = dupes.survey_id
AND br.text = dupes.text
AND br.rating = dupes.rating
AND br.privacy_agreement = dupes.privacy_agreement
ORDER BY br.client_id, br.survey_id, br.text, br.rating, br.privacy_agreement, br.id
How to rewrite it using Django ORM?

I hope this will work.
from django.db.models import Count, Subquery
# Equivalent of this query in with Django ORM: SELECT client_id, survey_id, text, rating, privacy_agreement, COUNT(*)
# FROM bad_reviews
# GROUP BY client_id, survey_id, text, rating, privacy_agreement
# HAVING count(*) > 1
subquery = BadReviews.objects \
.values('client_id', 'survey_id', 'text', 'rating', 'privacy_agreement') \
.annotate(count=Count('id')).filter(count__gt=1)
# use the subquery as a filter in the main query
bad_reviews = BadReviews.objects.filter(
client_id=Subquery(subquery.values('client_id')),
survey_id=Subquery(subquery.values('survey_id')),
text=Subquery(subquery.values('text')),
rating=Subquery(subquery.values('rating')),
privacy_agreement=Subquery(subquery.values('privacy_agreement')),
).order_by('client_id', 'survey_id', 'text', 'rating', 'privacy_agreement', 'id')

Related

How can I have statistics in django admin panel on User with date filter on related field?

Related model
class AbstractTask(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
issued_at = models.DateTimeField(auto_now_add=True)
Problem
I need to show some User statistics per days in the admin panel. Lets say I just need the number of issued tasks. And I need to be able to filter it by issue date (how many were issued yesterday, the day before yesterday, etc).
How I am trying to do it
I use User proxy models to register ModelAdmin for different statistics pages.
I use slightly modified (changed date ranges) DateFieldListFilter on task__issued_at field:
list_filter = [
('task__issued_at', DateFieldListFilter),
'username',
]
Filters on date field don't work
Filters don't work because they end up generating query similar to this:
queryset = (User.objects
.annotate(
# Different statistics.
num_tasks=Count('task'),
)
.filter(
# DateFieldListFilter.
task__issued_at__gte='2020-01-01',
task__issued_at__lt='2020-01-02',
)
.values('id', 'num_tasks')
)
SQL:
SELECT "auth_user"."id",
COUNT("task"."id") AS "num_tasks"
FROM "auth_user"
LEFT OUTER JOIN "task" ON ("auth_user"."id" = "task"."user_id")
INNER JOIN "task" T3 ON ("auth_user"."id" = T3."user_id")
WHERE (T3."issued_at" >= 2020-01-01 00:00:00+03:00
AND T3."issued_at" < 2020-01-02 00:00:00+03:00)
GROUP BY "auth_user"."id"
The problem is that filter adds second join on table "task" when I need just one.
Forcing first inner join by adding .filter(task__isnull=False) doesn't help. It just keeps performing two identical inner joins.
It is the same behavior in django 2 and 3.
Can It be done in Django?
Preferably as simple as possible: without raw sql, without much magic and with continuing using DateFieldListFilter.
But any solution would help.

The alternative QuerySet below gives the same result without any additional joins:
(queryset = User.objects
.annotate(
# Different statistics.
num_tasks=Count(
'task',
filter=models.Q(
Q(task__issued_at__gte='2020-01-01') &
Q(task__issued_at__lt='2020-01-02')
)
),
)
.values('id', 'num_tasks')
)
SQL:
SELECT "auth_user"."id", COUNT("task"."id")
FILTER (WHERE ("task"."issued_at" >= 2020-01-01 00:00:00+03:00 AND "task"."issed_at" < 2020-01-02 00:00:00+03:00)) AS "num_tasks"
FROM "auth_user"
LEFT OUTER JOIN "task" ON ("auth_user"."id" = "task"."user_id")
GROUP BY "auth_user"."id"
but not sure about the performance compared with yours.
Anyway, to make it work with the DateFieldListFilter you just need to override the queryset method:
class CustomDateFieldListFilter(DateFieldListFilter):
def queryset(self, request, queryset):
# Compare the requested value to decide how to filter the queryset.
q_objects = models.Q()
for key, value in self.used_parameters.items():
q_objects &= models.Q(**{key: value})
return queryset.annotate(num_tasks=Count('task', filter=models.Q(q_objects))).values('id', 'num_tasks')
and specify the new class:
list_filter = [
('task__issued_at', CustomDateFieldListFilter),
...
]
That's it.

Sum of related objects across 2 FK to the same table, with conditions

I have two models:
class User(Model):
...
class Message(Model):
sender = ForeignKey(User, CASCADE, 'sent_msgs')
receiver = ForeignKey(User, CASCADE, 'rcvd_msgs')
ignored = BooleanField()
I'm trying to annotate a queryset of Users with a sum total of their related messages, i.e. sum of both sent_msgs and rcvd_msgs. Additionally, any Message with ignored=True should be ignored.
I can do this with RawSQL fairly simply, using a subquery:
SELECT COUNT("messages_message"."id")
FROM "messages_message"
WHERE "messages_message"."ignored" = FALSE
AND (
"messages_message"."sender_id" = "users_user"."id"
OR
"messages_message"."receiver_id" = "users_user"."id"
)
queryset = queryset.annotate(msgs_count=RawSQL(that_query_above))
Is there a way to do this without using RawSQL?

We can use a Subquery [Django-doc] here:
from django.db.models import Count, OuterRef, Subquery, Q
User.objects.annotate(
msgs_count=Subquery(
Message.objects.filter(
Q(sender_id=OuterRef('pk')) | Q(receiver_id=OuterRef('pk')),
ignored=False
).order_by().values('ignored').annotate(cn=Count('*')).values('cn')
)
)
This then produces a query like:
SELECT auth_user.*,
(
SELECT COUNT(*) AS cn
FROM message U0
WHERE (U0.sender_id = auth_user.id OR U0.receiver_id = auth_user.id)
AND U0.ignored = False)
GROUP BY U0.ignored
) AS msgs_count
FROM auth_user

group by with all fields django

I want to run 2 queries in django, I am using mysql, Please help me
first one
SELECT * FROM `invitations`
WHERE post_id = 19215 GROUP BY (user_id)
it is not group by from user_id
data = Invitations.objects.filter(post_id=19215).annotate(user_id=Count('user_id'))
now i have add value
data = Invitations.objects.values('user_id').filter(post_id=19215).annotate(user_id=Count('user_id'))
it return me not all fields select *
data = Invitations.objects.values('user_id', 'xyz').filter(post_id=19215).annotate(user_id=Count('user_id'))
it group by user_id and xyz
Please give me solution
and second is
SELECT *, GROUP_CONCAT(interview_mode) FROM invitations WHERE post_id = 19215 GROUP BY (user_id)

Run this:
query= "SELECT *, GROUP_CONCAT(interview_mode)
FROM invitations WHERE post_id = 19215
GROUP BY (user_id)"
data = Model.objects.raw(query)

Django queryset: how to filter a RawSQL annotation based on a queryset field

Is it possible to do something like this in Django:
MyModel.objects.annotate(
val=RawSQL(
"SELECT COUNT(*) FROM another_model_table where some_field= %s",
(a_field_from_MyModel)
)
)
Thanks!

You can do something like this:
MyModel.objects.annotate(
val=RawSQL(
"""SELECT COUNT(*) FROM another_model_table
where some_field=myapp_mymodel.some_field""",
)
)

Django filter by annotated field is too slow

I use DRF and I have model Motocycle, which has > 2000 objects in DB. Model has one brand. I want to search by full_name:
queryset = Motocycle.objects.prefetch_related(
"brand"
).annotate(
full_name=Concat(
'brand__title',
Value(' - '),
'title',
)
)
)
I want to filter by full_name, but query is running very slowly:
(1.156) SELECT "mp_api_motocycle"."id"...
Without filtering with pagination:
(3.980) SELECT "mp_api_motocycle"."id"...
There is some possibilty to make this query faster?

Keep your full_name annotation as a column in the database and add an index to it.
Otherwise, you are doing full table scan while calculating full_name and then sorting by it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to delete duplicate rows from a table using Django ORM? - django

Related

How can I have statistics in django admin panel on User with date filter on related field?

Sum of related objects across 2 FK to the same table, with conditions

group by with all fields django

Django queryset: how to filter a RawSQL annotation based on a queryset field

Django filter by annotated field is too slow

Categories

Resources