django generic relation delete performance - django

I'm working on a project with two genericrelation in a model. I discover that the relations are useless and moreover therea 3 million records that we don't need anymore. Is there any way do delete it fast?
Remove the field on migrations has no effect because is generic.
So I tried
import time
from django.contrib.contenttypes.models import ContentType
from app.core import models as m
# UserInformation has a GenericRelation with Address
c = m.UserInformation.objects.first()
c_type = ContentType.objects.get_for_model(c)
# get all the models records generic related with UserInformation
query = m.Address.objects.filter(content_type_id=c_type.id)
start = time.time()
i=0
stop_iteration = 10
for user in query:
i += 1
user.delete()
if i == stop_iteration:
break
end = time.time()
seconds = end - start
print('Execution of %s deletes: %3d seconds' % (stop_iteration, seconds))
The result:
Execution of 10 deletes: 34 seconds
This means that it will takes 37 days to delete ~1million records
Is there any way to do that quicker?

A generic relation is defined by a content_type and an object_id. If you know the content_type you can find all object_id values and delete them in one query. I don't know the fields in your model but it should be something like this.
# get all related object ids
object_ids = m.Address.objects.filter(content_type_id=c_type.id)\
.values_list('object_id', flat=True)
# delete them in one query
YourModel.objects.filter(id__in=object_ids).delete()

Related

Django distinct related querying

I have two models:
Model A is an AbstractUserModel and Model B
class ModelB:
user = ForeignKey(User, related_name='modelsb')
timestamp = DateTimeField(auto_now_add=True)
What I want to find is how many users have at least one ModelB object created at least in 3 of the 7 past days.
So far, I have found a way to do it but I know for sure there is a better one and that is why I am posting this question.
I basically split the query into 2 parts.
Part1:
I added a foo method inside the User Model that checks if a user meets the above conditions
def foo(self):
past_limit = starting_date - timedelta(days=7)
return self.modelsb.filter(timestamp__gte=past_limit).order_by('timestamp__day').distinct('timestamp__day').count() > 2
Part 2:
In the Custom User Manager, I find the users that have more than 2 modelsb objects in the last 7 days and iterate through them applying the foo method for each one of them.
By doing this I narrow down the iterations of the required for loop. (basically its a filter function but you get the point)
def boo(self):
past_limit = timezone.now() - timedelta(days=7)
candidates = super().get_queryset().annotate(rc=Count('modelsb', filter=Q(modelsb__timestamp__gte=past_limit))).filter(rc__gt=2)
return list(filter(lambda x: x.foo(), candidates))
However, I want to know if there is a more efficient way to do this, that is without the for loop.
You can use conditional annotation.
I haven't been able to test this query, but something like this should work:
from django.db.models import Q, Count
past_limit = starting_date - timedelta(days=7)
users = User.objects.annotate(
modelsb_in_last_seven_days=Count('modelsb__timestap__day',
filter=Q(modelsb__timestamp__gte=past_limit),
distinct=True))
.filter(modelsb_in_last_seven_days__gte = 3)
EDIT:
This solution did not work, because the distinct option does specify what field makes an entry distinct.
I did some experimenting on my own Django instance, and found a way to make this work using SubQuery. The way this works is that we generate a subquery where we make the distinction ourself.
counted_modelb = ModelB.objects
.filter(user=OuterRef('pk'), timestamp__gte=past_limit)
.values('timestamp__day')
.distinct()
.annotate(count=Count('timestamp__day'))
.values('count')
query = User.objects
.annotate(modelsb_in_last_seven_days=Subquery(counted_modelb, output_field=IntegerField()))
.filter(modelsb_in_last_seven_days__gt = 2)
This annotates each row in the queryset with the count of all distinct days in modelb for the user, with a date greater than the selected day.
In the subquery I use values('timestamp__day') to make sure I can do distinct() (Because a combination of distinct('timestamp__day') and annotate() is unsupported.)

django annotate with queryset

I have Users who take Surveys periodically. The system has multiple surveys which it issues at set intervals from the submitted date of the last issued survey of that particular type.
class Survey(Model):
name = CharField()
description = TextField()
interval = DurationField()
users = ManyToManyField(User, related_name='registered_surveys')
...
class SurveyRun(Model):
''' A users answers for 1 taken survey '''
user = ForeignKey(User, related_name='runs')
survey = ForeignKey(Survey, related_name='runs')
created = models.DateTimeField(auto_now_add=True)
submitted = models.DateTimeField(null=True, blank=True)
# answers = ReverseForeignKey...
So with the models above a user should be alerted to take survey A next on this date:
A.interval + SurveyRun.objects.filter(
user=user,
survey=A
).latest('submitted').submitted
I want to run a daily periodic task which queries all users and creates new runs for all users who have a survey due according to this criteria:
For each survey the user is registered:
if no runs exist for that user-survey combo then create the first run for that user-survey combination and alert the user
if there are runs for that survey and none are open (an open run has been created but not submitted so submitted=None) and the latest one's submitted date plus the survey's interval is <= today, create a new run for that user-survey combo and alert the user
Ideally I could create a manager method which would annotate with a surveys_due field like:
users_with_surveys_due = User.objects.with_surveys_due().filter(surveys_due__isnull=False)
Where the annotated field would be a queryset of Survey objects for which the user needs to submit a new round of answers.
And I could issue alerts like this:
for user in users_with_surveys_due.all():
for survey in user.surveys_due:
new_run = SurveyRun.objects.create(
user=user,
survey=survey
)
alert_user(user, run)
However I would settle for a boolean flag annotation on the User object indicating one of the registered_surveys needs to create a new run.
How would I go about implementing something like this with_surveys_due() manager method so Postgres does all the heavy lifting? Is it possible to annotate with a collection objects, like a reverse FK?
UPDATE:
For clarity here is my current task in python:
def make_new_runs_and_alert_users():
runs = []
Srun = apps.get_model('surveys', 'SurveyRun')
for user in get_user_model().objects.prefetch_related('registered_surveys', 'runs').all():
for srvy in user.registered_surveys.all():
runs_for_srvy = user.runs.filter(survey=srvy)
# no runs exist for this registered survey, create first run
if not runs_for_srvy.exists():
runs.append(Srun(user=user, survey=srvy))
...
# check this survey has no open runs
elif not runs_for_srvy.filter(submitted=None).exists():
latest = runs_for_srvy.latest('submitted')
if (latest.submitted + qnr.interval) <= timezone.now():
runs.append(Srun(user=user, survey=srvy))
Srun.objects.bulk_create(runs)
UPDATE #2:
In attempting to use Dirk's solution I have this simple example:
In [1]: test_user.runs.values_list('survey__name', 'submitted')
Out[1]: <SurveyRunQuerySet [('Test', None)]>
In [2]: test_user.registered_surveys.values_list('name', flat=True)
Out[2]: <SurveyQuerySet ['Test']>
The user has one open run (submitted=None) for the Test survey and is registered to one survey (Test). He/She should not be flagged for a new run seeing as there is an un-submitted run outstanding for the only survey he/she is registered for. So I create a function encapsulating the Dirk's solution called get_users_with_runs_due:
In [10]: get_users_with_runs_due()
Out[10]: <UserQuerySet [<User: test#gmail.com>]> . # <-- should be an empty queryset
In [107]: for user in _:
print(user.email, i.has_survey_due)
test#gmail.com True # <-- should be false
UPDATE #3:
In my previous update I had made some changes to the logic to properly match what I wanted but neglected to mention or show the changes. Here is the query function below with comments by the changes:
def get_users_with_runs_due():
today = timezone.now()
survey_runs = SurveyRun.objects.filter(
survey=OuterRef('pk'),
user=OuterRef(OuterRef('pk'))
).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted__isnull=True)
surveys = Survey.objects.filter(
users=OuterRef('pk')
).annotate(
latest_submission_date=Subquery(
survey_runs.filter(submitted__isnull=False).values('submitted')[:1]
)
).annotate(
has_survey_runs=Exists(survey_runs)
).annotate(
has_pending_runs=Exists(pending_survey_runs)
).filter(
Q(has_survey_runs=False) | # either has no runs for this survey or
( # has no pending runs and submission date meets criteria
Q(has_pending_runs=False, latest_submission_date__lte=today - F('interval'))
)
)
return User.objects.annotate(has_survey_due=Exists(surveys)).filter(has_survey_due=True)
UPDATE #4:
I tried to isolate the issue by creating a function which would make most of the annotations on the Surveys by user in an attempt to check the annotation on that level prior to querying the User model with it.
def annotate_surveys_for_user(user):
today = timezone.now()
survey_runs = SurveyRun.objects.filter(
survey=OuterRef('pk'),
user=user
).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted=None)
return Survey.objects.filter(
users=user
).annotate(
latest_submission_date=Subquery(
survey_runs.filter(submitted__isnull=False).values('submitted')[:1]
)
).annotate(
has_survey_runs=Exists(survey_runs)
).annotate(
has_pending_runs=Exists(pending_survey_runs)
)
This worked as expected. Where the annotations were accurate and filtering with:
result.filter(
Q(has_survey_runs=False) |
(
Q(has_pending_runs=False) &
Q(latest_submission_date__lte=today - F('interval'))
)
)
produced the desired results: An empty queryset where the user should not have any runs due and vice-versa. Why is this not working when making it the subquery and querying from the User model?
To annotate users with whether or not they have a survey due, I'd suggest to use a Subquery expression:
from django.db.models import Q, F, OuterRef, Subquery, Exists
from django.utils import timezone
today = timezone.now()
survey_runs = SurveyRun.objects.filter(survey=OuterRef('pk'), user=OuterRef(OuterRef('pk'))).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted__isnull=True)
surveys = Survey.objects.filter(users=OuterRef('pk'))
.annotate(latest_submission_date=Subquery(survey_runs.filter(submitted__isnull=False).values('submitted')[:1]))
.annotate(has_survey_runs=Exists(survey_runs))
.annotate(has_pending_runs=Exists(pending_survey_runs))
.filter(Q(has_survey_runs=False) | Q(latest_submission_date__lte=today - F('interval')) & Q(has_pending_runs=False))
User.objects.annotate(has_survey_due=Exists(surveys))
.filter(has_survey_due=True)
I'm still trying to figure out how to do the other one. You cannot annotate a queryset with another queryset, values must be field equivalents. Also you cannot use a Subquery as queryset parameter to Prefetch, unfortunately. But since you're using PostgreSQL you could use ArrayField to list the ids of the surveys in a wrapped value, but I haven't found a way to do that, as you can't use aggregate inside a Subquery.

How can I query django models using modulo?

I want to send an email to users who haven't activated their accounts every 120 days. I'm using a DateTimeField for their created attribute.
How can I retrieve a queryset of users for whom created % 120 == 0?
Here's what I'm trying, using annotate and F objects:
members = Member.objects.annotate(
days_old=(datetime.datetime.now() - F('created'))
)
members = members.annotate(modulo_days=F('days_old') % 120)
members = members.filter(modulo_days=0)
...but this returns the errors:
TypeError: expected string or buffer
ProgrammingError: operator does not exist: interval % integer
How can I retrieve this queryset looking for the modulo of a timestamp on a Django model?
Another way of doing a queryset that could work for you:
from datetime import timedelta
from datetime import datetime
to_compare_datetime = datetime.now() - timedelta(days=180)
members = Member.objects.filter(account_activated=False, created__year=to_compare_datetime.year, created__month=to_compare_datetime.month, created__day=to_compare_datetime.day)
I'm supposing that your Member model has a field account_activated, and that the created field is a DateTimeField. Hope this can help you :)

Custom model method in Django

I'm developing a web app in Django that manages chores on a reoccurring weekly basis. These are the models I've come up with so far. Chores need to be able to be assigned multiple weekdays and times. So the chore of laundry could be Sunday # 8:00 am and Wednesday # 5:30 pm. I first want to confirm the models below are the best way to represent this. Secondly, I'm a little confused about model relationships and custom model methods. Since these chores are on a reoccurring basis, I need to be able to check if there has been a CompletedEvent in this week. Since this is row level functionality, that would be a model method correct? Based on the models below, how would I check for this? It has me scratching my head.
models.py:
from django.db import models
from datetime import date
class ChoreManager(models.Manager):
def by_day(self, day_name):
return self.filter(scheduledday__day_name = day_name)
def today(self):
todays_day_name = date.today().strftime('%A')
return self.filter(scheduledday__day_name = todays_day_name)
class Chore(models.Model):
objects = ChoreManager()
name = models.CharField(max_length=50)
notes = models.TextField()
class Meta:
ordering = ['scheduledday__time']
class ScheduledDay(models.Model):
day_name = models.CharField(max_length=8)
time = models.TimeField()
chore = models.ForeignKey('Chore')
class CompletedEvent(models.Model):
date_completed = DateTimeField(auto_now_add=True)
chore = models.ForeignKey('Chore')
Then all you need to do is:
monday_of_week = some_date - datetime.timedetla(days=some_date.weekday())
end_of_week = date + datetime.timedelta(days=7)
chore = Chore.objects.get(name='The chore your looking for')
ScheduledDay.objects.filter(completed_date__gte=monday_of_week,
completed_date__lt=end_of_week,
chore=chore)
A neater (and faster) option is to use Bitmasks!
Think of the days of the week you want a chore to be repeated on as a binary number—a bit for each day. For example, if you wanted a chore repeated every Tuesday, Friday and Sunday then you would get the binary number 1010010 (or 82 in decimal):
S S F T W T M
1 0 1 0 0 1 0 = 1010010
Days are reversed for sake of illustration
And to check if a chore should be done today, simply get the number of that day and do an &:
from datetime import datetime as dt
if dt.today().weekday() & 0b1010100:
print("Do chores!")
Models
Your models.py would look a bit like this:
from django.contrib.auth.models import User
from django.db import models
from django.utils.functional import cached_property
class Chore(models.Model):
name = models.CharField(max_length=128)
notes = models.TextField()
class ChoreUser(models.Model):
chore_detail = models.ForeignKey('ChoreDetail')
user = models.ForeignKey('ChoreDetail')
completed_time = models.DateTimeField(null=True, blank=True)
class ChoreDetail(models.Model):
chore = models.ForeignKey('Chore')
chore_users = models.ManyToManyField('User', through=ChoreUser)
time = models.DateTimeField()
date_begin = models.DateField()
date_end = models.DateField()
schedule = models.IntegerField(help_text="Bitmask of Weekdays")
#cached_property
def happens_today(self):
return bool(dt.today().weekday() & self.weekly_schedule)
This schema has a M2M relationship between a User and a Chore's Schedule. So you can extend your idea, like record the duration of the chore (if you want to), or even have many users participating in the same chore.
And to answer your question, if you'd like to get the list of completed events this week, you could could put this in a Model Manager for ChoreUser:
from datetime import datetime as dt, timedelta
week_start = dt.today() - timedelta(days=dt.weekday())
week_end = week_start + timedelta(days=6)
chore_users = ChoreUser.objects.filter(completed_time__range=(week_start, week_end))
Now you have all the information you need in a single DB call:
user = chore_users[0].user
time = chore_users[0].chore_detail.time
name = chore_users[0].chore_detail.chore.name
happens_today = chore_users[0].chore_detail.happens_today
You could also get all the completed chores for a user easily:
some_user.choreuser_set.filter(completed_time__range=(week_start, week_end))

How to obtain and/or save the queryset criteria to the DB?

I would like to save a queryset criteria to the DB for reuse.
So, if I have a queryset like:
Client.objects.filter(state='AL')
# I'm simplifying the problem for readability. In reality I could have
# a very complex queryset, with multiple filters, excludes and even Q() objects.
I would like to save to the DB not the results of the queryset (i.e. the individual client records that have a state field matching 'AL'); but the queryset itself (i.e. the criteria used in filtering the Client model).
The ultimate goal is to have a "saved filter" that can be read from the DB and used by multiple django applications.
At first I thought I could serialize the queryset and save that. But serializing a queryset actually executes the query - and then I end up with a static list of clients in Alabama at the time of serialization. I want the list to be dynamic (i.e. each time I read the queryset from the DB it should execute and retrieve the most current list of clients in Alabama).
Edit: Alternatively, is it possible to obtain a list of filters applied to a queryset?
Something like:
qs = Client.objects.filter(state='AL')
filters = qs.getFilters()
print filters
{ 'state': 'AL' }
You can do as jcd says, storing the sql.
You can also store the conditions.
In [44]: q=Q( Q(content_type__model="User") | Q(content_type__model="Group"),content_type__app_label="auth")
In [45]: c={'name__startswith':'Can add'}
In [46]: Permission.objects.filter(q).filter(**c)
Out[46]: [<Permission: auth | group | Can add group>, <Permission: auth | user | Can add user>]
In [48]: q2=Q( Q(content_type__model="User") | Q(content_type__model="Group"),content_type__app_label="auth", name__startswith='Can add')
In [49]: Permission.objects.filter(q2)
Out[49]: [<Permission: auth | group | Can add group>, <Permission: auth | user | Can add user>]
In that example you see that the conditions are the objects c and q (although they can be joined in one object, q2). You can then serialize these objects and store them on the database as strings.
--edit--
If you need to have all the conditions on a single database record, you can store them in a dictionary
{'filter_conditions': (cond_1, cond_2, cond_3), 'exclude_conditions': (cond_4, cond_5)}
and then serialize the dictionary.
You can store the sql generated by the query using the queryset's _as_sql() method. The method takes a database connection as an argument, so you'd do:
from app.models import MyModel
from django.db import connection
qs = MyModel.filter(pk__gt=56, published_date__lt=datetime.now())
store_query(qs._as_sql(connection))
You can use http://github.com/denz/django-stored-queryset for that
You can pickle the Query object (not the QuerySet):
>>> import pickle
>>> query = pickle.loads(s) # Assuming 's' is the pickled string.
>>> qs = MyModel.objects.all()
>>> qs.query = query # Restore the original 'query'.
Docs: https://docs.djangoproject.com/en/dev/ref/models/querysets/#pickling-querysets
But: You can’t share pickles between versions
you can create your own model to store your queries.
First field can contains fk to ContentTypes
Second field can be just text field with your query etc.
And after that you can use Q object to set queryset for your model.
The current answer was unclear to me as I don't have much experience with pickle. In 2022, I've found that turning a dict into JSON worked well. I'll show you what I did below. I believe pickling still works, so at the end I will show some more thoughts there.
models.py - example database structure
class Transaction(models.Model):
id = models.CharField(max_length=24, primary_key=True)
date = models.DateField(null=False)
amount = models.IntegerField(null=False)
info = models.CharField()
account = models.ForiegnKey(Account, on_delete=models.SET_NULL, null=True)
category = models.ForeignKey(Category, on_delete=models.SET_NULL, null=True, blank=False, default=None)
class Account(models.Model):
name = models.CharField()
email = models.EmailField()
class Category(models.Model):
name = models.CharField(unique=True)
class Rule(models.Model):
category = models.ForeignKey(Category, on_delete=models.SET_NULL, blank=False, null=True, default=None)
criteria = models.JSONField(default=dict) # this will hold our query
My models store financial transactions, the category the transaction fits into (e.g., salaried income, 1099 income, office expenses, labor expenses, etc...), and a rule to save a query to automatically categorize future transactions without having to remember the query every year when doing taxes.
I know, for example, that all my transactions with my consulting clients should be marked as 1099 income. So I want to create a rule for clients that will grab each monthly transaction and mark it as 1099 income.
Making the query the old-fashioned way
>>> from transactions.models import Category, Rule, Transaction
>>>
>>> client1_transactions = Transaction.objects.filter(account__name="Client One")
<QuerySet [<Transaction: Transaction object (1111111)>, <Transaction: Transaction object (1111112)>, <Transaction: Transaction object (1111113)...]>
>>> client1_transactions.count()
12
Twelve transactions, one for each month. Beautiful.
But how do we save this to the database?
Save query to database in JSONField
We now have Django 4.0 and a bunch of support for JSONField.
I've been able to grab the filtering values out of a form POST request, then add them in view logic.
urls.py
from transactions import views
app_name = "transactions"
urlpatterns = [
path("categorize", views.categorize, name="categorize"),
path("", views.list, name="list"),
]
transactions/list.html
<form action="{% url 'transactions:categorize' %}" method="POST">
{% csrf_token %}
<label for="info">Info field contains...</label>
<input id="info" type="text" name="info">
<label for="account">Account name contains...</label>
<input id="account" type="text" name="account">
<label for="category">New category should be...</label>
<input id="category" type="text" name="category">
<button type="submit">Make a Rule</button>
</form>
views.py
def categorize(request):
# get POST data from our form
info = request.POST.get("info", "")
account = request.POST.get("account", "")
category = request.POST.get("category", "")
# set up query
query = {}
if info:
query["info__icontains"] = info
if account:
query["account__name__icontains"] = account
# update the database
category_obj, _ = Category.objects.get_or_create(name=category)
transactions = Transaction.objects.filter(**query).order_by("-date")
Rule.objects.get_or_create(category=category_obj, criteria=query)
transactions.update(category=category_obj)
# render the template
return render(
request,
"transactions/list.html",
{
"transactions": transactions.select_related("account"),
},
)
That's pretty much it!
My example here is a little contrived, so please forgive any errors.
How to do it with pickle
I actually lied before. I have a little experience with pickle and I do like it, but I am not sure on how to save it to the database. My guess is that you'd then save the pickled string to a BinaryField.
Perhaps something like this:
>>> # imports
>>> import pickle # standard library
>>> from transactions.models import Category, Rule, Transaction # my own stuff
>>>
>>> # create the query
>>> qs_to_save = Transaction.objects.filter(account__name="Client 1")
>>> qs_to_save.count()
12
>>>
>>> # create the pickle
>>> saved_pickle = pickle.dumps(qs_to_save.query)
>>> type(saved_pickle)
<class 'bytes'>
>>>
>>> # save to database
>>> # make sure `criteria = models.BinaryField()` above in models.py
>>> # I'm unsure about this
>>> test_category, _ = Category.objects.get_or_create(name="Test Category")
>>> test_rule = Rule.objects.create(category=test_category, criteria=saved_pickle)
>>>
>>> # remake queryset at a later date
>>> new_qs = Transaction.objects.all()
>>> new_qs.query = pickle.loads(test_rule.criteria)
>>> new_qs.count()
12
Going even further beyond
I found a way to make this all work with my htmx live search, allowing me to see the results of my query on the front end of my site before saving.
This answer is already too long, so here's a link to a post if you care about that: Saving a Django Query to the Database.