django annotate with queryset - django

I have Users who take Surveys periodically. The system has multiple surveys which it issues at set intervals from the submitted date of the last issued survey of that particular type.
class Survey(Model):
name = CharField()
description = TextField()
interval = DurationField()
users = ManyToManyField(User, related_name='registered_surveys')
...
class SurveyRun(Model):
''' A users answers for 1 taken survey '''
user = ForeignKey(User, related_name='runs')
survey = ForeignKey(Survey, related_name='runs')
created = models.DateTimeField(auto_now_add=True)
submitted = models.DateTimeField(null=True, blank=True)
# answers = ReverseForeignKey...
So with the models above a user should be alerted to take survey A next on this date:
A.interval + SurveyRun.objects.filter(
user=user,
survey=A
).latest('submitted').submitted
I want to run a daily periodic task which queries all users and creates new runs for all users who have a survey due according to this criteria:
For each survey the user is registered:
if no runs exist for that user-survey combo then create the first run for that user-survey combination and alert the user
if there are runs for that survey and none are open (an open run has been created but not submitted so submitted=None) and the latest one's submitted date plus the survey's interval is <= today, create a new run for that user-survey combo and alert the user
Ideally I could create a manager method which would annotate with a surveys_due field like:
users_with_surveys_due = User.objects.with_surveys_due().filter(surveys_due__isnull=False)
Where the annotated field would be a queryset of Survey objects for which the user needs to submit a new round of answers.
And I could issue alerts like this:
for user in users_with_surveys_due.all():
for survey in user.surveys_due:
new_run = SurveyRun.objects.create(
user=user,
survey=survey
)
alert_user(user, run)
However I would settle for a boolean flag annotation on the User object indicating one of the registered_surveys needs to create a new run.
How would I go about implementing something like this with_surveys_due() manager method so Postgres does all the heavy lifting? Is it possible to annotate with a collection objects, like a reverse FK?
UPDATE:
For clarity here is my current task in python:
def make_new_runs_and_alert_users():
runs = []
Srun = apps.get_model('surveys', 'SurveyRun')
for user in get_user_model().objects.prefetch_related('registered_surveys', 'runs').all():
for srvy in user.registered_surveys.all():
runs_for_srvy = user.runs.filter(survey=srvy)
# no runs exist for this registered survey, create first run
if not runs_for_srvy.exists():
runs.append(Srun(user=user, survey=srvy))
...
# check this survey has no open runs
elif not runs_for_srvy.filter(submitted=None).exists():
latest = runs_for_srvy.latest('submitted')
if (latest.submitted + qnr.interval) <= timezone.now():
runs.append(Srun(user=user, survey=srvy))
Srun.objects.bulk_create(runs)
UPDATE #2:
In attempting to use Dirk's solution I have this simple example:
In [1]: test_user.runs.values_list('survey__name', 'submitted')
Out[1]: <SurveyRunQuerySet [('Test', None)]>
In [2]: test_user.registered_surveys.values_list('name', flat=True)
Out[2]: <SurveyQuerySet ['Test']>
The user has one open run (submitted=None) for the Test survey and is registered to one survey (Test). He/She should not be flagged for a new run seeing as there is an un-submitted run outstanding for the only survey he/she is registered for. So I create a function encapsulating the Dirk's solution called get_users_with_runs_due:
In [10]: get_users_with_runs_due()
Out[10]: <UserQuerySet [<User: test#gmail.com>]> . # <-- should be an empty queryset
In [107]: for user in _:
print(user.email, i.has_survey_due)
test#gmail.com True # <-- should be false
UPDATE #3:
In my previous update I had made some changes to the logic to properly match what I wanted but neglected to mention or show the changes. Here is the query function below with comments by the changes:
def get_users_with_runs_due():
today = timezone.now()
survey_runs = SurveyRun.objects.filter(
survey=OuterRef('pk'),
user=OuterRef(OuterRef('pk'))
).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted__isnull=True)
surveys = Survey.objects.filter(
users=OuterRef('pk')
).annotate(
latest_submission_date=Subquery(
survey_runs.filter(submitted__isnull=False).values('submitted')[:1]
)
).annotate(
has_survey_runs=Exists(survey_runs)
).annotate(
has_pending_runs=Exists(pending_survey_runs)
).filter(
Q(has_survey_runs=False) | # either has no runs for this survey or
( # has no pending runs and submission date meets criteria
Q(has_pending_runs=False, latest_submission_date__lte=today - F('interval'))
)
)
return User.objects.annotate(has_survey_due=Exists(surveys)).filter(has_survey_due=True)
UPDATE #4:
I tried to isolate the issue by creating a function which would make most of the annotations on the Surveys by user in an attempt to check the annotation on that level prior to querying the User model with it.
def annotate_surveys_for_user(user):
today = timezone.now()
survey_runs = SurveyRun.objects.filter(
survey=OuterRef('pk'),
user=user
).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted=None)
return Survey.objects.filter(
users=user
).annotate(
latest_submission_date=Subquery(
survey_runs.filter(submitted__isnull=False).values('submitted')[:1]
)
).annotate(
has_survey_runs=Exists(survey_runs)
).annotate(
has_pending_runs=Exists(pending_survey_runs)
)
This worked as expected. Where the annotations were accurate and filtering with:
result.filter(
Q(has_survey_runs=False) |
(
Q(has_pending_runs=False) &
Q(latest_submission_date__lte=today - F('interval'))
)
)
produced the desired results: An empty queryset where the user should not have any runs due and vice-versa. Why is this not working when making it the subquery and querying from the User model?

To annotate users with whether or not they have a survey due, I'd suggest to use a Subquery expression:
from django.db.models import Q, F, OuterRef, Subquery, Exists
from django.utils import timezone
today = timezone.now()
survey_runs = SurveyRun.objects.filter(survey=OuterRef('pk'), user=OuterRef(OuterRef('pk'))).order_by('-submitted')
pending_survey_runs = survey_runs.filter(submitted__isnull=True)
surveys = Survey.objects.filter(users=OuterRef('pk'))
.annotate(latest_submission_date=Subquery(survey_runs.filter(submitted__isnull=False).values('submitted')[:1]))
.annotate(has_survey_runs=Exists(survey_runs))
.annotate(has_pending_runs=Exists(pending_survey_runs))
.filter(Q(has_survey_runs=False) | Q(latest_submission_date__lte=today - F('interval')) & Q(has_pending_runs=False))
User.objects.annotate(has_survey_due=Exists(surveys))
.filter(has_survey_due=True)
I'm still trying to figure out how to do the other one. You cannot annotate a queryset with another queryset, values must be field equivalents. Also you cannot use a Subquery as queryset parameter to Prefetch, unfortunately. But since you're using PostgreSQL you could use ArrayField to list the ids of the surveys in a wrapped value, but I haven't found a way to do that, as you can't use aggregate inside a Subquery.

Related

Race condition when two different users inserting new records to database in Django

There is a race condition situation, when I want to create a new instance of model Order.
There is a daily_id field that everyday for any category starts from one. It means every category has its own daily id.
class Order(models.Model):
daily_id = models.SmallIntegerField(default=0)
category = models.ForeignKey(Categoty, on_delete=models.PROTECT, related_name="orders")
declare_time = models.DateField()
...
}
daily_id field of new record is being calculated using this method:
def get_daily_id(category, declare_time):
try:
last_order = Order.objects.filter(declare_time=declare_time,
category=category).latest('daily_id')
return last_order.daily_id + 1
except Order.DoesNotExist:
# If no order has been registered in declare_time date.
return 1
The problem is that when two different users are registering orders in the same category at the same time, it is highly likely that the orders have the repetitive daily_id values.
I have tried #transaction.atomic decorator for post method of DRF APIView and it didn't work!
You must use an auto increment and add a view that computes your semantic order like :
SELECT *, ROW_NUMBER() OVER(PARTITION BY MyDayDate ORDER BY id_autoinc) AS daily_id

Django ORM: Get all records and the respective last log for each record

I have two models, the simple version would be this:
class Users:
name = models.CharField()
birthdate = models.CharField()
# other fields that play no role in calculations or filters, but I simply need to display
class UserLogs:
user_id = models.ForeignKey(to='Users', related_name='user_daily_logs', on_delete=models.CASCADE)
reference_date = models.DateField()
hours_spent_in_chats = models.DecimalField()
hours_spent_in_p_channels = models.DecimalField()
hours_spent_in_challenges = models.DecimalField()
# other fields that play no role in calculations or filters, but I simply need to display
What I need to write is a query that will return all the fields of all users, with the latest log (reference_date) for each user. So for n users and m logs, the query should return n records. It is guaranteed that each user has at least one log record.
Restrictions:
the query needs to be written in django orm
the query needs to start from the user model. So Anything that goes like Users.objects... is ok. Anything that goes like UserLogs.objects... is not. That's because of filters and logic in the viewset, which is beyond my control
It has to be a single query, and no iterations in python, pandas or itertools are allowed. The Queryset will be directly processed by a serializer.
I shouldn't have to specify the names of the columns that need to be returned, one by one. The query must return all the columns from both models
Attempt no. 1 returns only user id and the log date (for obvious reasons). However, it is the right date, but I just need to get the other columns:
test = User.objects.select_related("user_daily_logs").values("user_daily_logs__user_id").annotate(
max_date=Max("user_daily_logs__reference_date"))
Attempt no. 2 generates as error (Cannot resolve expression type, unknown output_field):
logs = UserLogs.objects.filter(user_id=OuterRef('pk')).order_by('-reference_date')[:1]
users = Users.objects.annotate(latest_log = Subquery(logs))
This seems impossible taking into account all the restrictions.
One approach would be to use prefetch_related
users = User.objects.all().prefetch_related(
models.Prefetch(
'user_daily_logs',
queryset=UserLogs.objects.filter().order_by('-reference_date'),
to_attr="latest_log"
)
)
This will do two db queries and return all logs for every user which may or not be a problem depending on the number of records. If you need only logs for the current day as the name suggest, you can add that to filter and reduce the number of UserLogs records. Of course you need to get the first element from the list.
users.daily_logs[0]
For that you can create a #property on the User model which could look roughly like this
#property
def latest_log(self):
if not hasattr('daily_logs'):
return None
return self.daily_logs[0]
user.latest_log
You can also go a step further and try the following SubQuery inside Prefetch to limit the queryset to one element but I am not sure on the performance with this one (credits Django prefetch_related with limit).
users = User.objects.all().prefetch_related(
models.Prefetch(
'user_daily_logs',
queryset=UserLogs.objects.filter(id__in=Subquery(UserLogs.objects.filter(user_id=OuterRef('user_id')).order_by('-reference_date').values_list('id', flat=True)[:1] ) ),
to_attr="latest_log"
)
)

How can I filter a Django queryset by the latest of a related model?

Imagine I have the following 2 models in a contrived example:
class User(models.Model):
name = models.CharField()
class Login(models.Model):
user = models.ForeignKey(User, related_name='logins')
success = models.BooleanField()
datetime = models.DateTimeField()
class Meta:
get_latest_by = 'datetime'
How can I get a queryset of Users, which only contains users whose last login was not successful.
I know the following does not work, but it illustrates what I want to get:
User.objects.filter(login__latest__success=False)
I'm guessing I can do it with Q objects, and/or Case When, and/or some other form of annotation and filtering, but I can't suss it out.
We can use a Subquery here:
from django.db.models import OuterRef, Subquery
latest_login = Subquery(Login.objects.filter(
user=OuterRef('pk')
).order_by('-datetime').values('success')[:1])
User.objects.annotate(
latest_login=latest_login
).filter(latest_login=False)
This will generate a query that looks like:
SELECT auth_user.*, (
SELECT U0.success
FROM login U0
WHERE U0.user_id = auth_user.id
ORDER BY U0.datetime DESC
LIMIT 1
) AS latest_login
FROM auth_user
WHERE (
SELECT U0.success
FROM login U0
WHERE U0.user_id = auth_user.id
ORDER BY U0.datetime
DESC LIMIT 1
) = False
So the outcome of the Subquery is the success of the latest Login object, and if that is False, we add the related User to the QuerySet.
You can first annotate the max dates, and then filter based on success and the max date using F expressions:
User.objects.annotate(max_date=Max('logins__datetime'))\
.filter(logins__datetime=F('max_date'), logins__success=False)
for check bool use success=False and for get latest use latest()
your filter has been look this:
User.objects.filter(success=False).latest()

Checking for overlapping TimeField ranges

I have this model:
class Task(models.Model):
class Meta:
unique_together = ("campaign_id", "task_start", "task_end", "task_day")
campaign_id = models.ForeignKey(Campaign, on_delete=models.DO_NOTHING)
playlist_id = models.ForeignKey(PlayList, on_delete=models.DO_NOTHING)
task_id = models.AutoField(primary_key=True, auto_created=True)
task_start = models.TimeField()
task_end = models.TimeField()
task_day = models.TextField()
I need to write a validation test that checks if a newly created task time range overlaps with an existing one in the database.
For example:
A task with and ID 1 already has a starting time at 5:00PM and ends at 5:15PM on a Saturday. A new task cannot be created between the first task's start and end time. Where should I write this test and what is the most efficent way to do this? I also use DjangoRestFramework Serializers.
When you receive the form data from the user, you can:
Check the fields are consistent: user task_start < user task_end, and warn the user if not.
Query (SELECT) the database to retrieve all existing tasks which intercept the user time,
Order the records by task_start (ORDER BY),
Select only records which validate your criterion, a.k.a.:
task_start <= user task_start <= task_end, or,
task_start <= user task_end <= task_end.
warn the user if at least one record is found.
Everything is OK:
Construct a Task instance,
Store it in database.
Return success.
Implementation details:
task_start and task_end could be indexed in your database to improve selection time.
I saw that you also have a task_day field (which is a TEXT).
You should really consider using UTC DATETIME fields instead of TEXT, because you need to compare date AND time (and not only time): consider a task which starts at 23:30 and finish at 00:45 the day after…
This is how I solved it. It's not optimal by far, but I'm limited to python 2.7 and Django 1.11 and I'm also a beginner.
def validate(self, data):
errors = {}
task_start = data.get('task_start')
task_end = data.get('task_end')
time_filter = Q(task_start__range=[task_start, task_end])
| Q(task_end__range=[task_start, task_end])
filter_check = Task.objects.filter(time_filter).exists()
if task_start > task_end:
errors['error'] = u'End time cannot be earlier than start time!'
raise serializers.ValidationError(errors)
elif filter_check:
errors['errors'] = u'Overlapping tasks'
raise serializers.ValidationError(errors)
else:
pass
return data

Django: How to filter on inner join based on properties of second table?

My question is simple: in a Django app, I have a table Users and a table StatusUpdates. In the StatusUpdates table, I have a column user which is a foreign key pointing back to Users. How can I do a search expressing something like:
users.filter(latest_status_update.text__contains='Hello')
Edit:
Please excuse my lack of clarity. The query that I would like to make is something like "Give me all the users whose latest status update contains the text 'hello'". In Django code, I would do the following (which is really inefficient and ugly):
hello_users = []
for user in User.objects.all():
latest_status_update = StatusUpdate.objects.filter(user=user).order_by('-creation_date')[0]
if latest_status_update.text.contains('Hello'):
hello_users.append(user)
return hello_users
Edit 2:
I've already found the solution but since I was asked, here are the important parts of my models:
class User(models.Model):
...
class StatusUpdate(models.Model):
user = models.ForeignKey(User)
text = models.CharField(max_length=140)
creation_date = models.DateTimeField(auto_now_add=True, editable=False)
....
Okay, I think I got it:
from django.db.models import Max, F
User.objects\
.annotate(latest_status_update_id=Max('statusupdate__id'))\
.filter(
statusupdate__id=F('latest_status_update_id'),
statusupdate__text__icontains='hello'
)
For more info, see this section of the Django documentation.
Please note: I ended up changing my strategy a bit and settling for the strategy where the highest ID means the latest update. This is the case because I realized that a User could post two updates the same time and that would break my query.
latest_status_updates = filter(lambda x: x.text.contains('hello'),
[
user.statusupdates_set.order_by('-creation_date').first()
for user in User.objects.all()
]
)
users = list(set([status_update.user for status_update in latest_status_updates]))
EDIT:
Now I first get all LATEST status updates of each user into a list which is then filtered by the text field found in StatusUpdate class. In the second line, I extract users out of the filtered status updates and then produce a unique list of users.
I hope this helps!
Not sure I understand, are you trying to do something like
(StatusUpdates
.objects
.select_related("user")
.filter(text__contains = "hello")
.order_by("-updated")
.first())
This will return the StatusUpdate that was modified last (if you have a field called updated that stores the time of the last modification) which contains "Hello" in the text field. If none of the StatusUpdates contains that string, it will return None.
Then you can do:
latest = (StatusUpdates
.objects
.select_related("user")
.filter(text__contains = "hello")
.order_by("-updated")
.first())
#then if you needed the user too
if latest is not None:
user = latest.user #which does not call the DB again since you selected related`
If this isn't what you needed, please provide more details (models) and clarify your need