Django - Join on specific column

Django - Join on specific column - django

I have a model, let's call it Notification. A notification has a from_user, and a to_user among other fields which are not important right now.
class Notification(models.Model):
from_user = models.ForeignKey(
settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
related_name='from_user'
)
to_user = models.ForeignKey(
settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
related_name='to_user'
)
read = models.BooleanField(default=False)
...
What I want to achieve is, in theory, very simple, but I can not get it to work.
I would like to get the count of unread notifications for every user.
The following Django ORM expression:
User.objects.annotate(
notification_count=Count('notification', filter=(Q(notification__read=False)))
).values('notification_count')
Produces the following SQL:
SELECT COUNT("notification"."id") FILTER (WHERE "notification"."read" = False) AS "notification_count"
FROM "account"
LEFT OUTER JOIN "notification" ON ("account"."id" = "notification"."from_user_id")
GROUP BY "account"."id"
ORDER BY "account"."member_since" DESC
But this is not what I want. I want the LEFT_OUTER_JOIN to be done on the to_user column, not the from_user column.
How can I achieve this using the Django ORM?

I found out the solution after trying endless things. Here it is, for anyone who needs it:
User.objects.annotate(
notification_count=Count('to_user', filter=(Q(to_user__read=False)))
).values('notification_count')

Related

What is the best way to handle DJANGO migration data with over 500k records for MYSQL

A migration handles creating two new fields action_duplicate and status_duplicate
The second migration copies the data from the action and status fields, to the two newly created fields
def remove_foreign_keys_from_user_request(apps, schema_editor):
UserRequests = apps.get_model("users", "UserRequest")
for request_initiated in UserRequest.objects.all().select_related("action", "status"):
request_initiated.action_duplicate = request_initiated.action.name
request_initiated.status_duplicate = request_initiated.status.name
request_initiated.save()
The third migration is suppose to remove/delete the old fields action and status
The fourth migration should rename the new duplicate fields to the old deleted fields
The solution here is to remove the dependency on the status and action, to avoid unnecessary data base query, since the status especially will only be pending and completed
My question is for the second migration. The number of records are between 300k to 600k records so I need to know a more efficient way to do this so it doesn't take up all the memory available.
Note: The Database is MySQL.
A trimmed-down version of the UserRequest model
class UserRequest(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
reference = models.CharField(max_length=50, null=True, blank=True)
requester = models.ForeignKey(User, blank=True, null=True, on_delete=models.CASCADE)
action = models.ForeignKey(Action, on_delete=models.CASCADE)
action_duplicate = models.CharField(
max_length=50, choices=((ACTION_A, ACTION_A), (ACTION_B, ACTION_B)), default=ACTION_A
)
status = models.ForeignKey(ProcessingStatus, on_delete=models.CASCADE)
status_duplicate = models.CharField(
max_length=50,
choices=((PENDING, PENDING), (PROCESSED, PROCESSED)),
default=PENDING,
)

You can work with a Subquery expression [Django-doc], and do the update in bulk:
def remove_foreign_keys_from_user_request(apps, schema_editor):
UserRequests = apps.get_model('users', 'UserRequests')
Action = apps.get_user('users', 'Action')
Status = apps.get_user('users', 'ProcessingStatus')
UserRequests.objects.update(
action_duplicate=Subquery(
Action.objects.filter(
pk=OuterRef('action_id')
).values('name')[:1]
),
status_duplicate=Subquery(
Status.objects.filter(
pk=OuterRef('status_id')
).values('name')[:1]
)
)
That being said, it looks that what you are doing is actually the opposite of database normalization [wiki]: usually if there is duplicated data, you make an extra model where you make one Action/Status per value, and thus prevent having the same value for action_duplicate/status_duplicate multiple times in the database: this will make the database larger, and harder to maintain.
Note: normally a Django model is given a singular name, so UserRequest instead of UserRequests.

Annotating values from filtered related objects -- Case, Subquery, or another method?

I have some models in Django:
# models.py, simplified here
class Category(models.Model):
"""The category an inventory item belongs to. Examples: car, truck, airplane"""
name = models.CharField(max_length=255)
class UserInterestCategory(models.Model):
"""
How interested is a user in a given category. `interest` can be set by any method, maybe a neural network or something like that
"""
user = models.ForeignKey(User, on_delete=models.CASCADE) # user is the stock Django user
category = models.ForeignKey(Category, on_delete=models.CASCADE)
interest = models.PositiveIntegerField(default=0, validators=[MinValueValidator(0)])
class Item(models.Model):
"""This is a product that we have in stock, which we are trying to get a User to buy"""
model_number = models.CharField(max_length=40, default="New inventory item")
product_category = models.ForeignKey(Category, null=True, blank=True, on_delete=models.SET_NULL, verbose_name="Category")
I have a list view showing items, and I'm trying to sort by user_interest_category for the currently logged in user.
I have tried a couple different querysets and I'm not thrilled with them:
primary_queryset = Item.objects.all()
# this one works, and it's fast, but only finds items the users ALREADY has an interest in --
primary_queryset = primary_queryset.filter(product_category__userinterestcategory__user=self.request.user).annotate(
recommended = F('product_category__userinterestcategory__interest')
)
# this one works great but the baby jesus weeps at its slowness
# probably because we are iterating through every user, item, and userinterestcategory in the db
primary_queryset = primary_queryset.annotate(
recommended = Case(
When(product_category__userinterestcategory__user=self.request.user, then=F('product_category__userinterestcategory__interest')),
default=Value(0),
output_field=IntegerField(),
)
)
# this one works, but it's still a bit slow -- 2-3 seconds per query:
interest = Subquery(UserInterestCategory.objects.filter(category=OuterRef('product_category'), user=self.request.user).values('interest'))
primary_queryset = primary_queryset.annotate(interest)
The third method is workable, but it doesn't seem like the most efficient way to do things. Isn't there a better method than this?

Using ForeignKey to sort with order_by and distinct not working

I'm trying to sort model Game by each title and most recent update(post) without returning duplicates.
views.py
'recent_games': Game.objects.all().order_by('title', '-update__date_published').distinct('title')[:5],
The distinct method on the query works perfectly however the update__date_published doesn't seem to be working.
models.py
Model - Game
class Game(models.Model):
title = models.CharField(max_length=100)
slug = models.SlugField(unique=True)
description = models.TextField()
date_published = models.DateTimeField(default=timezone.now)
cover = models.ImageField(upload_to='game_covers')
cover_display = models.ImageField(default='default.png', upload_to='game_displays')
developer = models.CharField(max_length=100)
twitter = models.CharField(max_length=50, default='')
reddit = models.CharField(max_length=50, default='')
platform = models.ManyToManyField(Platform)
def __str__(self):
return self.title
Model - Update
class Update(models.Model):
author = models.ForeignKey(User, models.SET_NULL, blank=True, null=True,) # If user is deleted keep all updates by said user
article_title = models.CharField(max_length=100, help_text="Use format: Release Notes for MM/DD/YYYY")
content = models.TextField(help_text="Try to stick with a central theme for your game. Bullet points is the preferred method of posting updates.")
date_published = models.DateTimeField(db_index=True, default=timezone.now, help_text="Use date of update not current time")
game = models.ForeignKey(Game, on_delete=models.CASCADE)
article_image = models.ImageField(default='/media/default.png', upload_to='article_pics', help_text="")
platform = ChainedManyToManyField(
Platform,
horizontal=True,
chained_field="game",
chained_model_field="game",
help_text="You must select a game first to autopopulate this field. You can select multiple platforms using Ctrl & Select (PC) or ⌘ & Select (Mac).")

See this for distinct reference Examples (those after the first will only work on PostgreSQL)
See this one for Reverse Query - See this one for - update__date_published
Example -
Entry.objects.order_by('blog__name', 'mod_date').distinct('blog__name', 'mod_date')
Your Query-
Game.objects.order_by('title', '-update__date_published').distinct('title')[:5]

You said:
The -update__date_published does not seem to be working as the Games are only returning in alphabetical order.
The reason is that the first order_by field is title; the secondary order field -update__date_published would only kick in if you had several identical titles, which you don't because of distinct().
If you want the Game objects to be ordered by latest update rather their title, omitting title from the ordering seems the obvious solution until you get a ProgrammingError that DISTINCT ON field requires field at the start of the ORDER BY clause.
The real solution to sorting games by latest update is:
games = (Game.objects
.annotate(max_date=Max('update__date_published'))
.order_by('-update__date_published'))[:5]

The most probable misunderstanding here is the join in your orm query. They ussually lazy-loading, so the date_published field is not yet available, yet you are trying to sort against it. You need the select_related method to load the fk relation as a join.
'recent_games': Game.objects.select_related('update').all().order_by('title', '-update__date_published').distinct('title')[:5]

Implementing Django ForeignKey-like subqueries without ForeignKey

I have a non-ForeignKey ID from an external system that acts as my join key. I'd like to do Django ORM style queries using this ID.
My desired query is:
results = MyModel.objects.filter(level='M', children__name__contains='SOMETHING')
My model looks like this:
class MyModel(BaseModel):
LEVELS = (
('I', 'Instance'),
('M', 'Master'),
('J', 'Joined')
)
level = models.CharField(max_length=2, choices=LEVELS, default='I')
parent = models.ForeignKey('self', blank=True, null=True, related_name='children', on_delete=models.SET_NULL )
master_id = models.CharField(max_length=200)
name = models.CharField(max_length=300, blank=True, null=True)
This works fine with parent as a field, but parent is redundant with the master_id field: master_id indicates which children belong to which master node. I'd like to get rid of parent (primarily because the dataset is fairly large and setting the parent IDs when importing data takes a long time).
The SQL equivalent of what I'm looking for is:
SELECT
DISTINCT( s_m.master_id )
FROM
mytable s_m JOIN
mytable s_i ON
s_i.level = 'I' and s_m.level='M' AND s_i.master_id == s_m.master_id
WHERE
s_i.name like '%SOMETHING%';
I believe there's a way to use Manager or QuerySet to enable clean querying of children (in this case, the children's names) within the Django ORM framework, but I can't figure out how. Any pointers would be appreciated.

Can you try something like this?
table1.objects.filter(master_id__in=table2.objects.filter(level='I').values_list(master_id,flat=True),level='M',name__contains='SOMETHING').values_list(master_id).distinct()

Python/Django recursion issue with saved querysets

I have user profiles that are each assigned a manager. I thought using recursion would be a good way to query every employee at every level under a particular manager. The goal is, if the CEO were to sign in, he should be able to query everyone at the company - but If I sign on I can only see people in my immediate team and the people below them, etc. until you get to the low level employees.
However when I run the following:
def team_training_list(request):
# pulls all training documents from training document model
user = request.user
manager_direct_team = Profile.objects.filter(manager=user)
query = Profile.objects.filter(first_name='fake')
trickle_team = manager_loop(manager_direct_team, query)
# manager_trickle_team = manager_direct_team | trickle_team
print(trickle_team)
def manager_loop(list, query):
for member in list:
user_instance = User.objects.get(username=member)
has_team = Profile.objects.filter(manager=user_instance)
if has_team:
query = query | has_team
manager_loop(has_team, query)
else:
continue
return query
It only returns the last query that was run instead of the compiled queryset that I am trying to grow. I've tried placing 'return' before 'manager_loop(has_team, query) in order save the values but it also kills the loop at the first non-manager employee instead of continuing to the next employee.
I'm new to django so if there is an better way than recursion to pull the information that I need, I'd appreciate suggestions on that too.
EDIT:
As requested, here is the profile model.
class Profile(models.Model):
user = models.OneToOneField(User, on_delete=models.CASCADE)
first_name = models.CharField(max_length=30, blank=False)
last_name = models.CharField(max_length=30, blank=False)
email = models.EmailField( blank=True, help_text='Optional',)
receive_email_notifications = models.BooleanField(default=False)
mobile_number = models.CharField(
max_length=15,
blank=True,
help_text='Optional'
)
carrier_options = (
(None, ''),
('#txt.att.net', 'AT&T'),
('#messaging.sprintpcs.com', 'Sprint'),
('#tmomail.net', 'T-Mobile'),
('#vtext.com', 'Verizon'),
)
mobile_carrier = models.CharField(max_length=25, choices=carrier_options, blank=True,
help_text='Optional')
receive_sms_notifications = models.BooleanField(default=False)
job_title = models.ForeignKey(JobTitle, unique=False, null=True)
manager = models.ForeignKey(User, unique=False, blank=True, related_name='+', null=True)

Ok, so it's a hierarchical model.
The problem with your current approach is this line:
query = query | has_team
This reassigns the local name query to a new queryset, but does not reassign the name in the caller. (Well, that's what I think it's trying to do - I am a little rusty but I don't think you can just | together querysets like that.) You'd also need something like:
query = manager_loop(has_team, query)
to propagate the changes via the returned object.
That said, while Django doesn't have built-in support for recursive queries, there are some third party packages that do. Old answers eg (Django self-recursive foreignkey filter query for all childs and Creating efficient database queries for hierarchical models (django)) recommend django-mptt. Your tag mentions postgres, so this post might be relevant:
https://two-wrongs.com/fast-sql-for-inheritance-in-a-django-hierarchy
If you don't use a third-party approach, it should be possible to clean up the evolution of the queryset - cast it to a set and use update or something, since you're accumulating profiles. But the key error is not using the returned modified object.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django - Join on specific column - django

I found out the solution after trying endless things. Here it is, for anyone who needs it: User.objects.annotate( notification_count=Count('to_user', filter=(Q(to_user__read=False))) ).values('notification_count')

Related

What is the best way to handle DJANGO migration data with over 500k records for MYSQL

Annotating values from filtered related objects -- Case, Subquery, or another method?

Using ForeignKey to sort with order_by and distinct not working

Implementing Django ForeignKey-like subqueries without ForeignKey

Python/Django recursion issue with saved querysets

Categories

Resources