Minimizing Flagging System DB Cost

Minimizing Flagging System DB Cost - django

I am trying to figure out a way to make a flagging system that does not require a new instance to be entered in the database (Postgres) every time a user flags a video. (extra context below/above the code) The fields I would like to have are 'Description', 'Timestamp' and 'Flag Choice'.
So I was wondering if this would work. If I make a Flag model and make 'Flag Choices' (Gore, Excessive Violence, ect.) their own Positive Integer Fields and increment the fields accordingly and then combine the id of the post, the description for why they flagged the post, and the timestamp into ONE FIELD by separating new entries by commas into a TextField (In the User Model instead of the Flag model so I know who flagged whatever post)...Will that one Text Field eventually become too big? IMPORTANT: Every time a flag is reviewed and closed, it is deleted from said field (context below)
Context: In the Flag model there will be a post_id field along with Excessive Violence Gore ect. that are Positive Integer fields which are incremented every time someone submits a flag. Then in the User model there will be ONE field which will contain something like the following.
(Commas represent the split of the fields of 'post_id', 'description' and 'timestamp' in the database)
5, "Another flag from the same user in the same TextField.", 2019:9:15
# New Entry
...
Then to get the flag from that one field, I would use a regular expression in combination with a view (that passes a specific video as an argument from a flag management page) to get the post_id, description, timestamp from the TextField (recording the positions for slicing) then after the flag status is "Closed", the function will delete that slice (Starting with the post_id, ending with the timezone, slicing at the commas)
Will this work? The end result SHOULD be... When a post gets flagged, a new Flag model is made, at the same time (if this is the first flag from the user/the first flag for the post)a 'flag_info' field is created in the user model and the post_id, description, and timestamp are entered into said field. If that same user flags another video, a new instance is created for that specific post in the flag model and the flag choice (Gore, Excessive Violence, ect.) is incremented. At the same time the post_id, description, and timestamp are appended to the same field as the following "post_id; description; timestamp," and to grab a specific flag, use a regular expression ( and further processing on the moderation page ) to parse the post_id (used to view the specific post [which will be returned in a different function]) description, and timestamp.
Forgive me if this is difficult to understand, I'm still trying to figure this idea out myself.
I haven't found anything about this through google nor any other search engine.
Flag model
class Flag(models.Model):
FLAG_CHOICES =(
('Sexually Explicit Content', 'Sexually Explicit Content'),
('Child Abuse', 'Child Abuse'), # High priority, auto send to admin, ban if fake flag
('Promotes Definition Terrorism', 'Promotes Definition Terrorism'), # High priority, auto send to admin ban if fake flag
('Gore, Self Harm, Extreme Violence', 'Gore, Self Harm, Extreme Violence'),
('Spam/Misleading/Click-Bait', 'Spam/Misleading/Click-Bait'),
('Calling For Mass Flag', 'Calling For Mass Flag'),
('Doxing', 'Doxing'),
('Animal Abuse', 'Animal Abuse'),
('Threatening Behaviour', 'Threatening Behavior'),
('Calls To Action', 'Calls To Action')
)
STATUS_OPTIONS = (
('Open', 'Open'),
('Being Reviewed', 'Being Reviewed'),
('Pending', 'Pending'),
('Closed', 'Closed'),
)
objects = models.Manager()
content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE, null=True)
object_id = models.PositiveIntegerField()
content_object = GenericForeignKey('content_type', 'object_id')
# Make positive integer fields for flag_choices so we can increment count instead of making a new instance every time
sexually_explicit_content = models.PositiveIntegerField(null=True)
child_abuse = models.PositiveIntegerField(null=True)
promotes_terrorism = models.PositiveIntegerField(null=True)
gore_harm_violence = models.PositiveIntegerField(null=True)
spam_clickbait = models.PositiveIntegerField(null=True)
mass_flag = models.PositiveIntegerField(null=True)
doxing = models.PositiveIntegerField(null=True)
animal_abuse = models.PositiveIntegerField(null=True)
threating_behaviour = models.PositiveIntegerField(null=True)
calls_to_action = models.PositiveIntegerField(null=True)
sexualizing_children = models.PositiveIntegerField(null=True)
# Increment the above fields when a flag with corresponding offence is submitted
who_flagged = models.TextField(null=True) # This will allow me to get all users who flagged a specific post (split by commas, and use a function to loop through the newly formed list of user ids, then on each iteration. I would be able to grab the user model for futher operations
flagged_date = models.DateTimeField(auto_now_add=True, null=True)
flag_choices = models.CharField(choices=FLAG_CHOICES, max_length=100, null=True) # Required Choices of offences
status = models.CharField(choices=STATUS_OPTIONS, default='Open', max_length=50, null=True)
def get_rendered_html(self):
template_name = 'vids/templates/vids/moderation.html'
return render_to_string(template_name, {'object': self.content_object})
User Model or Custom User Profile model
class CustomUser(models.Model):
...
reported = models.TextField() # This will hold all the information about the users flag
# Meaning the following things will be in the same 'box' (
flag_info) in the DB... and will look like this...
" post_id = 4; description = 'There was something in the background against the rules.'; timestamp = 2019:9:25,"
Then when the same user flags another video, something like the following would be appended to the 'flag_info' field...
All of this will be one big long string.
post_id = 24; description = "There was something in the background that showed my email."; timestamp = 2019:10:25,'
# To get flag_info from a user, I would do the following in a view
def get_flag(user, post_id):
# User is going to be the the user model that we need to pull from
# post_id is so I can use regex to pull the slice
# This is really simplified since it would take a while to write the whole thing
info = user.flag_info
split = info.split(",")
for i in split:
if i[0] == post_id:
# do something with it
# Alternatively I could do this
for i in split:
new = i.split(';')
# position 0 is the post_id, position 1 is description and position 3 is timestamp...Here I would do further processsing
To keep track of who flagged what I would make a TextField in the Flag model then every time a user flags a post, their user_id gets recorded in said TextField. When we need to review the flags, I would use the 'get_flag' function after splitting 'who_flagged' by commas. Which would extract the fields I need for processing.
Since I don't have thousands of videos/users, I can't test if the field will eventually become too large.

Related

Django query based on through table

I have a 4 models which are Contents, Filters,ContentFilter , Users.
a user can view contents.
a content can be restricted using Filters so users can't see it.
here are the models.
class Content(models.Model):
title = models.CharField(max_length=120)
text = models.TextField()
filters = models.ManyToManyField(to="Filter", verbose_name=_('filter'), blank=True, related_name="filtered_content",through='ContentFilter')
class Filter(models.Model):
name = models.CharField(max_length=255, verbose_name=_('name'), unique=True)
added_user = models.ManyToManyField(to=User, related_name="added_user", blank=True)
ignored_user = models.ManyToManyField(to=User, related_name="ignored_user", blank=True)
charge_status = models.BooleanField(blank=True, verbose_name=_('charge status'))
class ContentFilter(models.Model):
content = models.ForeignKey(Content, on_delete=models.CASCADE)
filter = models.ForeignKey(Filter, on_delete=models.CASCADE)
manual_order = models.IntegerField(verbose_name=_('manual order'), default=0,rst'))
access = models.BooleanField(_('has access'))
What it means is that 5 contents exist(1,2,3,4,5).
2 users exist. x,y
A filter can be created with ignored user of (x).
Contents of 1,2,3 have a relation with filter x.
so now X sees 4,5 and Y sees 1,2,3,4,5
what I'm doing now is that based on which user has requested find which filters are related to them.
then query the through table(ContentFilter) to find what contents a user can't see and then exclude them from all of the contents.(this helps with large joins)
filters = Filter.objects.filter(Q(added_user=user)|(Q(ignored_user=user))
excluded_contents = list(ContentFilter.objects.filter(filter__in=filters).values_list('id',flat=True))
contents = Contents.objects.exclude(id__in=excluded_contents)
Problem
I want a way so that Filters can have an order and filter a queryset based on top ContentFilter for each user.
for example content 1 can be blocked for all users with 1 filter ( filter x where ignored user has all the users)
but in ContentFilter has a manual_order of 0.
then in a second filter all users who have a charge status of True can see this content.(filter y where added user has all the users and charge status True.)
and in ContentFilter has a manual_order of 1.
I think I can do it using a for loop to check all the contents and choose the top most ContentFilter of them based on filters that include that user but it's both time and resource consuming.
and I prefer not to use raw SQL but I'm not sure if there is a way to do it using django orm

I managed to solve this using Subquery.
first I create a list of filters that user is part of.
filters = Filter.objects.filter(Q(added_user=user)|(Q(ignored_user=user))
then I create a subquery to assign each content with a access value (if any filter is applied on it.)
current_used_filters = ContentFilter.objects.filter(Q(filter__in=user_filters),content=OuterRef('pk')).order_by('-manual_order')
blocked_content_list = Content.objects.annotate(access=Subquery(current_used_filters.values('access')[:1])).filter(
access=False).values_list('id', flat=True)
this raises a problem
if any of my contents does not have a filter of filters associated with it then it would not be included in this.
so I filter the ones that have an access value of False
this means that this content has a filter with a high manual order which blocks it for this specific user.
so now I have a list of content IDs which now I can exclude from all contents.
so it would be:
contents = Contents.objects.exclude(id__in=blocked_content_list)

Checking for overlapping TimeField ranges

I have this model:
class Task(models.Model):
class Meta:
unique_together = ("campaign_id", "task_start", "task_end", "task_day")
campaign_id = models.ForeignKey(Campaign, on_delete=models.DO_NOTHING)
playlist_id = models.ForeignKey(PlayList, on_delete=models.DO_NOTHING)
task_id = models.AutoField(primary_key=True, auto_created=True)
task_start = models.TimeField()
task_end = models.TimeField()
task_day = models.TextField()
I need to write a validation test that checks if a newly created task time range overlaps with an existing one in the database.
For example:
A task with and ID 1 already has a starting time at 5:00PM and ends at 5:15PM on a Saturday. A new task cannot be created between the first task's start and end time. Where should I write this test and what is the most efficent way to do this? I also use DjangoRestFramework Serializers.

When you receive the form data from the user, you can:
Check the fields are consistent: user task_start < user task_end, and warn the user if not.
Query (SELECT) the database to retrieve all existing tasks which intercept the user time,
Order the records by task_start (ORDER BY),
Select only records which validate your criterion, a.k.a.:
task_start <= user task_start <= task_end, or,
task_start <= user task_end <= task_end.
warn the user if at least one record is found.
Everything is OK:
Construct a Task instance,
Store it in database.
Return success.
Implementation details:
task_start and task_end could be indexed in your database to improve selection time.
I saw that you also have a task_day field (which is a TEXT).
You should really consider using UTC DATETIME fields instead of TEXT, because you need to compare date AND time (and not only time): consider a task which starts at 23:30 and finish at 00:45 the day after…

This is how I solved it. It's not optimal by far, but I'm limited to python 2.7 and Django 1.11 and I'm also a beginner.
def validate(self, data):
errors = {}
task_start = data.get('task_start')
task_end = data.get('task_end')
time_filter = Q(task_start__range=[task_start, task_end])
| Q(task_end__range=[task_start, task_end])
filter_check = Task.objects.filter(time_filter).exists()
if task_start > task_end:
errors['error'] = u'End time cannot be earlier than start time!'
raise serializers.ValidationError(errors)
elif filter_check:
errors['errors'] = u'Overlapping tasks'
raise serializers.ValidationError(errors)
else:
pass
return data

Is there a way to rewrite this apparently simple Django snippet so that it doesn't hit the database so much?

I have a class called ToggleProperty. I use it to store information about whether a use toggled some properties on an object. Examples of properties are "like", "bookmark" and "follow".
class ToggleProperty(models.Model):
# "like", "bookmark", "follow" etc
property_type = CharField()
# The user who toggled the property
user = ForeignKey(User)
# The object upon which the user is toggling the property, e.g. "user likes image"
object_id = models.TextField()
content_type = models.ForeignKey(ContentType)
content_object = generic.GenericForeignKey('content_type', 'object_id')
Now, I'd like to get a list of users that are followed by a certain other user, let's call him Tom.
I can't just query on ToggleProperty, because that would give me ToggleProperties, not Users!
So I do this:
# First get the ContentType for user, we'll need it
user_ct = ContentType.objects.get_for_model(User)
# Now get the users that Tom follows
followed_by_tom = [
user_ct.get_object_for_this_type(id = x.object_id) for x in
ToggleProperty.objects.filter(
property_type = "follow",
user = tom,
content_type = ContentType.objects.get_for_model(User))
]
The problem with this is that it hits the database in my view, and I don't like that.
If this wasn't ugly enough, hear me out. I'm actually interested in the images uploaded by the users that Tom follows, so I can show Tom all the images by the people he follows.
So to the code above, I add this:
images = Image.objects.filter(user__in = followed_by_tom)
This ends up performing over 400 queries, and taking over a second to process. There has to be a better way, could you please show me the path?

This piece:
followed_by_tom = [
user_ct.get_object_for_this_type(id = x.object_id) for x in
ToggleProperty.objects.filter(
property_type = "follow",
user = tom,
content_type = ContentType.objects.get_for_model(User))
]
gets all of the User instances which are followed in N queries when it could be done in a single query. In fact you don't need the instances themselves only the ids to get the Images with the IN query. So you can remove the extra queries in the loop via:
followed_by_tom = ToggleProperty.objects.filter(
property_type="follow",
user=tom,
content_type=ContentType.objects.get_for_model(User)
).values_list('object_id', flat=True)
images = Image.objects.filter(user__in=followed_by_tom)
Since followed_by_tom is never evaluated the ORM should execute this as a single query with a sub-select.

Multiple Form with Single Submit Button

I'm currently working with django project. I had to filter the data store on the database based on the user input on form (at template) as looked below.
On form user either enter value or leave it blank. So what I have to do is first find the (valid) user input and then fire appropriate query to display data as user input in the form. So final result should be displayed on table at template.
As I'm new to django, how should I have to pass the data and fire query to represent data at multiple field. As help or link related to these type problem are expected. ( I just able to filter from the database with only one form and had no concept to solve this.)
Model of my temp project is as below.
class exReporter(models.Model):
first_name = models.CharField(max_length=30)
last_name = models.CharField(max_length=30)
email = models.EmailField()
gender = models.CharField(max_length=1)
age = models.IntegerField()
label = models.IntegerField()

There are a number of approaches you can take, but here is one solution you can use that involves chaining together the appropriate filters based on the form's posted data:
*Note: To conform to Python's naming convention, rename exReporter class to ExReporter.
# views.py
def process_ex_reporter_form(request):
if request.method == "POST":
# ExReporterForm implementation details not included.
ex_reporter_form = ExReporterForm(request.POST)
if ex_reporter_form.is_valid():
# If form field has no data, cleaned data should be None.
gender = ex_reporter_form.cleaned_data['gender']
age_start = ex_reporter_form.cleaned_data['age_start']
age_end = ex_reporter_form.cleaned_data['age_end']
aggregation_group = ex_reporter_form.cleaned_data['aggregation_group']
aggregation_id = ex_reporter_form.cleaned_data['aggregation_id']
ex_reporters = ExReporter.objects.get_ex_reporters(gender, age_start,
age_end, aggregation_group, aggregation_id)
else:
# Pass back form for correction.
pass
else:
# Pass new form to user.
pass
# models.py
class ExReporterManager(models.Manager):
def get_ex_reporters(self, gender, age_start, age_end, aggregation_group,
aggregation_id):
ex_reporters = super(ExReporterManager, self).get_query_set().all()
# Even though the filters are being applied in separate statements,
# database will only be hit once.
if ex_reporters:
if gender:
ex_reporters = ex_reporters.filter(gender=gender)
if age_start:
ex_reporters = ex_reporters.filter(age__gt=age_start)
if age_end:
ex_reporters = ex_reporters.filter(age__lt=age_end)
# Apply further filter logic for aggregation types supported.
return ex_reporters

Django: Distinct on forgin key relationship

I'm working on a Ticket/Issue-tracker in django where I need to log the status of each ticket. This is a simplification of my models.
class Ticket(models.Model):
assigned_to = ForeignKey(User)
comment = models.TextField(_('comment'), blank=True)
created = models.DateTimeField(_("created at"), auto_now_add=True)
class TicketStatus(models.Model):
STATUS_CHOICES = (
(10, _('Open'),),
(20, _('Other'),),
(30, _('Closed'),),
)
ticket = models.ForeignKey(Ticket, verbose_name=_('ticket'))
user = models.ForeignKey(User, verbose_name=_('user'))
status = models.IntegerField(_('status'), choices=STATUS_CHOICES)
date = models.DateTimeField(_("created at"), auto_now_add=True)
Now, getting the status of a ticket is easy sorting by date and retrieving the first column like this.
ticket = Ticket.objects.get(pk=1)
ticket.ticketstatus_set.order_by('-date')[0].get_status_display()
But then I also want to be able to filter on status in the Admin, and those have to get the status trough a Ticket-queryset, which makes it suddenly more complex. How would I get a queryset with all Tickets with a certain status?

I guess you are trying to avoid a cycle (asking for each ticket status) to filter manually the queryset. As far as I know you cannot avoid that cycle. Here are ideas:
# select_related avoids a lot of hits in the database when enter the cycle
t_status = TicketStatus.objects.select_related('Ticket').filter(status = ID_STATUS)
# this is an array with the result
ticket_array = [ts.ticket for ts in tickets_status]
Or, since you mention you were looking for a QuerySet, this might be what you are looking for
# select_related avoids a lot of hits in the database when enter the cycle
t_status = TicketStatus.objects.select_related('Ticket').filter(status = ID_STATUS)
# this is a QuerySet with the result
tickets = Tickets.objects.filter(pk__in = [ts.ticket.pk for ts in t_status])
However, the problem might be in the way you are modeling the data. What you called TickedStatus is more like TicketStatusLog because you want to keep track of the user and date who change the status.
Therefore, the reasonable approach is to add a field 'current_status' to the Ticket model that is updated each time a new TicketStatus is created. In this way (1) you don't have to order a table each time you ask for a ticket and (2) you would simply do something like Ticket.objects.filter(current_status = ID_STATUS) for what I think you are asking.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Minimizing Flagging System DB Cost - django

Related

Django query based on through table

Checking for overlapping TimeField ranges

Is there a way to rewrite this apparently simple Django snippet so that it doesn't hit the database so much?

Multiple Form with Single Submit Button

Django: Distinct on forgin key relationship

Categories

Resources