Django query based on through table - django

I have a 4 models which are Contents, Filters,ContentFilter , Users.
a user can view contents.
a content can be restricted using Filters so users can't see it.
here are the models.
class Content(models.Model):
title = models.CharField(max_length=120)
text = models.TextField()
filters = models.ManyToManyField(to="Filter", verbose_name=_('filter'), blank=True, related_name="filtered_content",through='ContentFilter')
class Filter(models.Model):
name = models.CharField(max_length=255, verbose_name=_('name'), unique=True)
added_user = models.ManyToManyField(to=User, related_name="added_user", blank=True)
ignored_user = models.ManyToManyField(to=User, related_name="ignored_user", blank=True)
charge_status = models.BooleanField(blank=True, verbose_name=_('charge status'))
class ContentFilter(models.Model):
content = models.ForeignKey(Content, on_delete=models.CASCADE)
filter = models.ForeignKey(Filter, on_delete=models.CASCADE)
manual_order = models.IntegerField(verbose_name=_('manual order'), default=0,rst'))
access = models.BooleanField(_('has access'))
What it means is that 5 contents exist(1,2,3,4,5).
2 users exist. x,y
A filter can be created with ignored user of (x).
Contents of 1,2,3 have a relation with filter x.
so now X sees 4,5 and Y sees 1,2,3,4,5
what I'm doing now is that based on which user has requested find which filters are related to them.
then query the through table(ContentFilter) to find what contents a user can't see and then exclude them from all of the contents.(this helps with large joins)
filters = Filter.objects.filter(Q(added_user=user)|(Q(ignored_user=user))
excluded_contents = list(ContentFilter.objects.filter(filter__in=filters).values_list('id',flat=True))
contents = Contents.objects.exclude(id__in=excluded_contents)
Problem
I want a way so that Filters can have an order and filter a queryset based on top ContentFilter for each user.
for example content 1 can be blocked for all users with 1 filter ( filter x where ignored user has all the users)
but in ContentFilter has a manual_order of 0.
then in a second filter all users who have a charge status of True can see this content.(filter y where added user has all the users and charge status True.)
and in ContentFilter has a manual_order of 1.
I think I can do it using a for loop to check all the contents and choose the top most ContentFilter of them based on filters that include that user but it's both time and resource consuming.
and I prefer not to use raw SQL but I'm not sure if there is a way to do it using django orm

I managed to solve this using Subquery.
first I create a list of filters that user is part of.
filters = Filter.objects.filter(Q(added_user=user)|(Q(ignored_user=user))
then I create a subquery to assign each content with a access value (if any filter is applied on it.)
current_used_filters = ContentFilter.objects.filter(Q(filter__in=user_filters),content=OuterRef('pk')).order_by('-manual_order')
blocked_content_list = Content.objects.annotate(access=Subquery(current_used_filters.values('access')[:1])).filter(
access=False).values_list('id', flat=True)
this raises a problem
if any of my contents does not have a filter of filters associated with it then it would not be included in this.
so I filter the ones that have an access value of False
this means that this content has a filter with a high manual order which blocks it for this specific user.
so now I have a list of content IDs which now I can exclude from all contents.
so it would be:
contents = Contents.objects.exclude(id__in=blocked_content_list)

Related

Django one-to-many relation: optimize code to reduce number of database queries executed

I have 2 models with a one-to-many relation on a MySQL DB:
class Domains(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=50, unique=True)
description = models.TextField(blank=True, null=True)
class Kpis(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=50, unique=True)
description = models.TextField(blank=True, null=True)
domain_id = models.ForeignKey(Domains, on_delete=models.CASCADE, db_column='domain_id')
In order to bring ALL the domains with all their kpis objects, i use this code with a for loop:
final_list = []
domains_list = Domains.objects.all()
for domain in domains_list:
# For each domain, get all related KPIs
domain_kpis = domain.kpis_set.values()
final_list.append({domain:domains_kpis})
The total number of queries i run is: 1 + the number of total domains i have, which is quite a lot.
I'm looking for a way to optimize this, preferably to execute it within only one query on the database. Is this possible?
You use .prefetch_related(…) [Django-doc] for this:
final_list = []
domains_list = Domains.objects.prefetch_related('kpis_set')
for domain in domains_list:
# For each domain, get all related KPIs
domain_kpis = domain.kpis_set.all()
final_list.append({domain:domains_kpis})
This will make two queries: one to query the domains, and a second to query all the related Kpis with a single query into memory.
Furthermore please do not use .values(). You can serialze data to JSON with Django's serializer framework, by making use of .values() you "erode" the model layer. See the Serializing Django objects section of the documentation for more information.
Just wanted to add that you are asking a solution for "classic" N +1 queries problem. Here you can read a something about it and aslo find the examples for prefetch_related method adviced in Willem's answer.
Another thing worth mentioning is that probably you aren't suppose to use this dict final_list.append({domain:domains_kpis}), but instead you may want to map some field(s) from Domain to some field(s) from Kapis models and, if this is true, you can specify exact fields you'd like to have prefetched using Prefetch:
domains_list = Domains.objects.prefetch_related(Prefetch('kpis_set'), queryset=Kapis.objects.all().only('some_field_you_want_to_have'))
final_list = []
for domain in domains_list:
domain_kpis = domain.kpis_set.all()
final_list.append({domain.some_field:domains_kpis.prefetched_field})
This should give another boost to performance on big-volume table's.

Minimizing Flagging System DB Cost

I am trying to figure out a way to make a flagging system that does not require a new instance to be entered in the database (Postgres) every time a user flags a video. (extra context below/above the code) The fields I would like to have are 'Description', 'Timestamp' and 'Flag Choice'.
So I was wondering if this would work. If I make a Flag model and make 'Flag Choices' (Gore, Excessive Violence, ect.) their own Positive Integer Fields and increment the fields accordingly and then combine the id of the post, the description for why they flagged the post, and the timestamp into ONE FIELD by separating new entries by commas into a TextField (In the User Model instead of the Flag model so I know who flagged whatever post)...Will that one Text Field eventually become too big? IMPORTANT: Every time a flag is reviewed and closed, it is deleted from said field (context below)
Context: In the Flag model there will be a post_id field along with Excessive Violence Gore ect. that are Positive Integer fields which are incremented every time someone submits a flag. Then in the User model there will be ONE field which will contain something like the following.
(Commas represent the split of the fields of 'post_id', 'description' and 'timestamp' in the database)
5, "Another flag from the same user in the same TextField.", 2019:9:15
# New Entry
...
Then to get the flag from that one field, I would use a regular expression in combination with a view (that passes a specific video as an argument from a flag management page) to get the post_id, description, timestamp from the TextField (recording the positions for slicing) then after the flag status is "Closed", the function will delete that slice (Starting with the post_id, ending with the timezone, slicing at the commas)
Will this work? The end result SHOULD be... When a post gets flagged, a new Flag model is made, at the same time (if this is the first flag from the user/the first flag for the post)a 'flag_info' field is created in the user model and the post_id, description, and timestamp are entered into said field. If that same user flags another video, a new instance is created for that specific post in the flag model and the flag choice (Gore, Excessive Violence, ect.) is incremented. At the same time the post_id, description, and timestamp are appended to the same field as the following "post_id; description; timestamp," and to grab a specific flag, use a regular expression ( and further processing on the moderation page ) to parse the post_id (used to view the specific post [which will be returned in a different function]) description, and timestamp.
Forgive me if this is difficult to understand, I'm still trying to figure this idea out myself.
I haven't found anything about this through google nor any other search engine.
Flag model
class Flag(models.Model):
FLAG_CHOICES =(
('Sexually Explicit Content', 'Sexually Explicit Content'),
('Child Abuse', 'Child Abuse'), # High priority, auto send to admin, ban if fake flag
('Promotes Definition Terrorism', 'Promotes Definition Terrorism'), # High priority, auto send to admin ban if fake flag
('Gore, Self Harm, Extreme Violence', 'Gore, Self Harm, Extreme Violence'),
('Spam/Misleading/Click-Bait', 'Spam/Misleading/Click-Bait'),
('Calling For Mass Flag', 'Calling For Mass Flag'),
('Doxing', 'Doxing'),
('Animal Abuse', 'Animal Abuse'),
('Threatening Behaviour', 'Threatening Behavior'),
('Calls To Action', 'Calls To Action')
)
STATUS_OPTIONS = (
('Open', 'Open'),
('Being Reviewed', 'Being Reviewed'),
('Pending', 'Pending'),
('Closed', 'Closed'),
)
objects = models.Manager()
content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE, null=True)
object_id = models.PositiveIntegerField()
content_object = GenericForeignKey('content_type', 'object_id')
# Make positive integer fields for flag_choices so we can increment count instead of making a new instance every time
sexually_explicit_content = models.PositiveIntegerField(null=True)
child_abuse = models.PositiveIntegerField(null=True)
promotes_terrorism = models.PositiveIntegerField(null=True)
gore_harm_violence = models.PositiveIntegerField(null=True)
spam_clickbait = models.PositiveIntegerField(null=True)
mass_flag = models.PositiveIntegerField(null=True)
doxing = models.PositiveIntegerField(null=True)
animal_abuse = models.PositiveIntegerField(null=True)
threating_behaviour = models.PositiveIntegerField(null=True)
calls_to_action = models.PositiveIntegerField(null=True)
sexualizing_children = models.PositiveIntegerField(null=True)
# Increment the above fields when a flag with corresponding offence is submitted
who_flagged = models.TextField(null=True) # This will allow me to get all users who flagged a specific post (split by commas, and use a function to loop through the newly formed list of user ids, then on each iteration. I would be able to grab the user model for futher operations
flagged_date = models.DateTimeField(auto_now_add=True, null=True)
flag_choices = models.CharField(choices=FLAG_CHOICES, max_length=100, null=True) # Required Choices of offences
status = models.CharField(choices=STATUS_OPTIONS, default='Open', max_length=50, null=True)
def get_rendered_html(self):
template_name = 'vids/templates/vids/moderation.html'
return render_to_string(template_name, {'object': self.content_object})
User Model or Custom User Profile model
class CustomUser(models.Model):
...
reported = models.TextField() # This will hold all the information about the users flag
# Meaning the following things will be in the same 'box' (
flag_info) in the DB... and will look like this...
" post_id = 4; description = 'There was something in the background against the rules.'; timestamp = 2019:9:25,"
Then when the same user flags another video, something like the following would be appended to the 'flag_info' field...
All of this will be one big long string.
post_id = 24; description = "There was something in the background that showed my email."; timestamp = 2019:10:25,'
# To get flag_info from a user, I would do the following in a view
def get_flag(user, post_id):
# User is going to be the the user model that we need to pull from
# post_id is so I can use regex to pull the slice
# This is really simplified since it would take a while to write the whole thing
info = user.flag_info
split = info.split(",")
for i in split:
if i[0] == post_id:
# do something with it
# Alternatively I could do this
for i in split:
new = i.split(';')
# position 0 is the post_id, position 1 is description and position 3 is timestamp...Here I would do further processsing
To keep track of who flagged what I would make a TextField in the Flag model then every time a user flags a post, their user_id gets recorded in said TextField. When we need to review the flags, I would use the 'get_flag' function after splitting 'who_flagged' by commas. Which would extract the fields I need for processing.
Since I don't have thousands of videos/users, I can't test if the field will eventually become too large.

Query intermediate through fields in django

I have a simple Relation model, where a user can follow a tag just like stackoverflow.
class Relation(models.Model):
user = AutoOneToOneField(User)
follows_tag = models.ManyToManyField(Tag, blank=True, null=True, through='TagRelation')
class TagRelation(models.Model):
user = models.ForeignKey(Relation, on_delete=models.CASCADE)
following_tag = models.ForeignKey(Tag, on_delete=models.CASCADE)
pub_date = models.DateTimeField(default=timezone.now)
class Meta:
unique_together = ['user', 'following_tag']
Now, to get the results of all the tags a user is following:
kakar = CustomUser.objects.get(email="kakar#gmail.com")
tags_following = kakar.relation.follows_tag.all()
This is fine.
But, to access intermediate fields I have to go through a big list of other queries. Suppose I want to display when the user started following a tag, I will have to do something like this:
kakar = CustomUser.objects.get(email="kakar#gmail.com")
kakar_relation = Relation.objects.get(user=kakar)
t1 = kakar.relation.follows_tag.all()[0]
kakar_t1_relation = TagRelation.objects.get(user=kakar_relation, following_tag=t1)
kakar_t1_relation.pub_date
As you can see, just to get the date I have to go through so much query. Is this the only way to get intermediate values, or this can be optimized? Also, I am not sure if this model design is the way to go, so if you have any recomendation or advice I would be very grateful. Thank you.
You need to use Double underscore i.e. ( __ ) for ForeignKey lookup,
Like this :
user_tags = TagRelation.objects.filter(user__user__email="kakar#gmail.com").values("following_tag__name", "pub_date")
If you need the name of the tag, you can use following_tag__name in the query and if you need id you can use following_tag__id.
And for that you need to iterate through the result of above query set, like this:
for items in user_tags:
print items['following_tag__name']
print items['pub_date']
One more thing,The key word values will return a list of dictionaries and you can iterate it through above method and if you are using values_list in the place of values, it will return a list of tuples. Read further from here .

Django - Displaying result information while optimizing database queries with models that multiple foreign key relationships

So I'm trying to put together a webpage and I am currently have trouble putting together a results page for each user in the web application I am putting together.
Here are what my models look like:
class Fault(models.Model):
name = models.CharField(max_length=255)
severity = models.PositiveSmallIntegerField(default=0)
description = models.CharField(max_length=1024, null=False, blank=False)
recommendation = models.CharField(max_length=1024, null=False, blank=False)
date_added = models.DateTimeField(_('date added'), default=timezone.now)
...
class FaultInstance(models.Model):
auto = models.ForeignKey(Auto)
fault = models.ForeignKey(Fault)
date_added = models.DateTimeField(_('date added'), default=timezone.now)
objects = FaultInstanceManager()
...
class Auto(models.Model):
label = models.CharField(max_length=255)
model = models.CharField(max_length=255)
make = models.CharField(max_length=255)
year = models.IntegerField(max_length=4)
user = models.ForeignKey(AUTH_USER_MODEL)
...
I don't know if my model relationships are ideal, however it made sense it my head. So each user can have multiple Auto objects associated to them. And each Auto can have multiple FaultInstance objects associated to it.
In the results page, I want to list out the all the FaultInstances that a user has across their Autos. And under each listed FaultInstance I will have a list of all the autos that the user owns that has the fault, with its information (here is kind of what I had in mind).
All FaultInstance Listing Ordered by Severity (large number to low number)
FaultInstance:
FaultDescription:
FaultRecommendation:
ListofAutosWithFault:
AutoLabel AutoModel AutoYear ...
AutoLabel AutoModel AutoYear ...
Obviously, do things the correct way would mean that I want to do as much of the list creation in the Python/Django side of things and avoid doing any logic or processing in the template. I am able to create a list per severity with the a model manager as seen here:
class FaultInstanceManager(models.Manager):
def get_faults_by_user_severity(self, user, severity):
faults = defaultdict(list)
qs_faultinst = self.model.objects.select_related().filter(
auto__user=user, fault__severity=severity
).order_by('auto__make')
for result in qs_faultinst:
faults[result.fault].append(result)
faults.default_factory = None
return faults
I still need to specify each severity but I guess if I only have 5 severity levels, I can create a list for each severity level and pass each individual one to template. Any suggestions for this is appreciated. However, thats not my problem. My stopping point right now is that I want to create a summary table at the top of their report which can give the user breakdown of fault instances per make|model|year. I can't think of the proper query or data structure to pass on to the template.
Summary (table of all the FaultInstances with the following column headers):
FaultInstance Make|Model|Year NumberOfAutosAffected
This will let me know metrics for a make or a model or a year (in the example below, its separating faults based on model). I'm listing FaultInstances because I'm only listed Faults that a connected to a user.
For Example
Bad Starter Nissan 1
Bad Tailight Honda 2
Bad Tailight Nissan 1
And I am such a perfectionist that I want to do this while optimizing database queries. If I can create a data structure in my original query that will be easily parsed in template and still get both these sections in my report (maybe a defaultdict of a defaultdict(list)), thats what I want to do. Thanks for the help and hopefully my question is thorough and makes sense.
It makes sense to use related names because it simplifies your query. Like this:
class FaultInstance(models.Model):
auto = models.ForeignKey(Auto, related_name='fault_instances')
fault = models.ForeignKey(Fault, related_name='fault_instances')
...
class Auto(models.Model):
user = models.ForeignKey(AUTH_USER_MODEL, related_name='autos')
In this case you can use:
qs_faultinst = user.fault_instances.filter(fault__severity=severity).order_by('auto__make')
instead of:
qs_faultinst = self.model.objects.select_related().filter(
auto__user=user, fault__severity=severity
).order_by('auto__make')
I can't figure out your summary table, may be you meant:
Fault Make|Model|Year NumberOfAutosAffected
In this case you can use aggregation. But It (grouping) would still be slow if you have enough data. The one easy solution is just to denormalize data by creating extra model and create few signals to update it or you can use cache.
If you have a predefined set of severities then think about this:
class Fault(models.Model):
SEVERITY_LOW = 0
SEVERITY_MIDDLE = 1
SEVERITY_HIGH = 2
...
SEVERITY_CHOICES = (
(SEVERITY_LOW, 'Low'),
(SEVERITY_MIDDLE, 'Middle'),
(SEVERITY_HIGH, 'High'),
...
)
...
severity = models.PositiveSmallIntegerField(default=SEVERITY_LOW,
choices=SEVERITY_CHOICES)
...
In your templates you can just iterate through Fault.SEVERITY_CHOICES.
Update:
Change your models:
Аllocate model into a separate model:
class AutoModel(models.Model):
name = models.CharField(max_length=255)
Change the field model of model Auto :
class Auto(models.Model):
...
auto_model = models.ForeignKey(AutoModel, related_name='cars')
...
Add a model:
class MyDenormalizedModelForReport(models.Model):
fault = models.ForeignKey(Fault, related_name='reports')
auto_model = models.ForeignKey(AutoModel, related_name='reports')
year = models.IntegerField(max_length=4)
number_of_auto_affected = models.IntegerField(default=0)
Add a signal:
def update_denormalized_model(sender, instance, created, **kwargs):
if created:
rep, dummy_created = MyDenormalizedModelForReport.objects.get_or_create(fault=instance.fault, auto_model=instance.auto.auto_model, year=instance.auto.year)
rep.number_of_auto_affected += 1
rep.save()
post_save.connect(update_denormalized_model, sender=FaultInstance)

Django: Distinct on forgin key relationship

I'm working on a Ticket/Issue-tracker in django where I need to log the status of each ticket. This is a simplification of my models.
class Ticket(models.Model):
assigned_to = ForeignKey(User)
comment = models.TextField(_('comment'), blank=True)
created = models.DateTimeField(_("created at"), auto_now_add=True)
class TicketStatus(models.Model):
STATUS_CHOICES = (
(10, _('Open'),),
(20, _('Other'),),
(30, _('Closed'),),
)
ticket = models.ForeignKey(Ticket, verbose_name=_('ticket'))
user = models.ForeignKey(User, verbose_name=_('user'))
status = models.IntegerField(_('status'), choices=STATUS_CHOICES)
date = models.DateTimeField(_("created at"), auto_now_add=True)
Now, getting the status of a ticket is easy sorting by date and retrieving the first column like this.
ticket = Ticket.objects.get(pk=1)
ticket.ticketstatus_set.order_by('-date')[0].get_status_display()
But then I also want to be able to filter on status in the Admin, and those have to get the status trough a Ticket-queryset, which makes it suddenly more complex. How would I get a queryset with all Tickets with a certain status?
I guess you are trying to avoid a cycle (asking for each ticket status) to filter manually the queryset. As far as I know you cannot avoid that cycle. Here are ideas:
# select_related avoids a lot of hits in the database when enter the cycle
t_status = TicketStatus.objects.select_related('Ticket').filter(status = ID_STATUS)
# this is an array with the result
ticket_array = [ts.ticket for ts in tickets_status]
Or, since you mention you were looking for a QuerySet, this might be what you are looking for
# select_related avoids a lot of hits in the database when enter the cycle
t_status = TicketStatus.objects.select_related('Ticket').filter(status = ID_STATUS)
# this is a QuerySet with the result
tickets = Tickets.objects.filter(pk__in = [ts.ticket.pk for ts in t_status])
However, the problem might be in the way you are modeling the data. What you called TickedStatus is more like TicketStatusLog because you want to keep track of the user and date who change the status.
Therefore, the reasonable approach is to add a field 'current_status' to the Ticket model that is updated each time a new TicketStatus is created. In this way (1) you don't have to order a table each time you ask for a ticket and (2) you would simply do something like Ticket.objects.filter(current_status = ID_STATUS) for what I think you are asking.