cross instance calculations in django queryset - django

Not sure that it's (a) doable and (b) if I formulate the task correctly. Perhaps the right way is refactoring the db design, but I would appreciate any opinion on that.
I have a model in django app, where I track the times a user enters and exits a certain page (via either form submission or just closing the broswer window). I do tracking using django channels, but it does not matter in this case.
The model looks like:
class TimeStamp(models.Model):
class Meta:
get_latest_by = 'enter_time'
page_name = models.CharField(max_length=1000)
participant = models.ForeignKey(to=Participant, related_name='timestamps')
timestamp = models.DateTimeField()
enter_exit_type = models.CharField(max_length=1000, choices=ENTEREXITTYPES,)
What I need to do is to calculate how much time a user spends on this page. So I need to loop through all records of Timestamp for the specific user, and calculate time difference between records of 'enter' and 'exit' types records.
So the db data may look like:
id timestamp enter_exit_type
1 20:12:12 enter
2 20:12:13 exit
3 20:18:12 enter
4 20:21:12 exit
5 20:41:12 enter
so what is the right way to produce a resulting queryset that look like:
id time_spent_sec
1 0:01
2 3:00
The last 'enter' record is ignored because there is no corresponding 'exit' record.
The record 1 in resulting queryset is difference between timestamps in ids 2 and 1. The record 2 in resulting queryset is difference between timestamps in ids 4 and 3.
I can just loop through the records, looking for the nearest 'exit' record and calculate it but I was thinking if there is a simpler solution?

It's possible:
1) Use the approach here to group by user if you want to get answer for all users in one query.
2) filter out the last unclosed entry with enter_exit_type == 'enter'.
3) .annotate(timestamp_with_sign=Case(When(enter_exit_type='exit', then=F('timestamp') * -1), default=F('timestamp'), )
4) Sum() by the timestamp_with_sign field.
I'm not sure, that F('timestamp') would work, you may need to search for the way to convert it to unix time.

This model structure may not be sufficient for your requirement. So I would suggest to change your model as,
class TimeStamp(models.Model):
class Meta:
get_latest_by = 'enter_time'
page_name = models.CharField(max_length=1000)
participant = models.ForeignKey(Musician, related_name='timestamps')
enter = models.DateTimeField()
exit = models.DateTimeField(null=True, blank=True)
Then you will get the data as,
from django.db.models import F, Sum, ExpressionWrapper, DurationField
TimeStamp.objects.values(
'participant').annotate(
sum_time_diff=Sum(
ExpressionWrapper(F('exit') - F('enter'), output_field=DurationField())
)
)
The response be like,
<QuerySet [{'participant': 1, 'sum_time_diff': datetime.timedelta(0, 7)}, {'participant': 2, 'sum_time_diff': datetime.timedelta(0, 2)}]>

Related

Django query to return percentage of a users with a post

Two models Users (built-in) and Posts:
class Post(models.Model):
post_date = models.DateTimeField(default=timezone.now)
user = models.ForeignKey(User, on_delete=models.CASCADE, null=True, related_name='user_post')
post = models.CharField(max_length=100)
I want to have an API endpoint that returns the percentage of users that have posted. Basically I want SUM(unique users who have posted) / total_users
I have been trying to play around with annotate and aggregate, but I am getting the sum of posts for each users, or the sum of users per post (which is one...). How can I get the sum of posts returned with unique users, divide that by user.count and return?
I feel like I am missing something silly but my brain has gone to mush staring at this.
class PostParticipationAPIView(generics.ListAPIView):
queryset = Post.objects.all()
serializer_class = PostSerializer
def get_queryset(self):
start_date = self.request.query_params.get('start_date')
end_date = self.request.query_params.get('end_date')
# How can I take something like this, divide it by User.objects.all().count() * 100, and assign it to something to return as the queryset?
queryset = Post.objects.filter(post_date__gte=start_date, post_date__lte=end_date).distinct('user').count()
return queryset
My goal is to end up with the endpoint like:
{
total_participation: 97.3
}
Thanks for any guidance.
BCBB
EDIT
OK, I am still struggling a bit. I tried to create a serializer that just had a decimal field for participation_percentage like:
percentage_participation = serializers.DecimalField(max_digits=5, decimal_places=2, max_value=100, min_value=0)
Then I calculate in the view, but I get an error:
Got AttributeError when attempting to get a value for field percentage_participation on serializer ParticipationSerializer.
The serializer field might be named incorrectly and not match any attribute or key on the str instance.
Original exception text was: 'str' object has no attribute 'percentage_participation'.
Error was the same if I made it a CharField (in case there was some string coercion?).
So then I tried to move it to a Serializer Method and put all the calculation logic in there. This calculated fine, but if I had to provide a query_set in the view. If provided a model object, it just returned the percentage as many times as the query (say Posts.objects.all() had a total of 100 posts, it returned the percentage 100 times).
So then I tried to override the get_queryset in the view, but I HAVE to return something. If I just return { "meh", "hello" } then I return the percentage from the SerializerMethodField one time and the end result is exactly what I want.
I just have no idea as to WHY or how to do this correctly.
Thanks for your help.
EDIT #2
OK so I realized why I was only getting one, it was iterating over the string I returned, which was one character. When I returned "meh" it gave me three of the percentage, iterating over each character in the string...
I am not understanding from playing around, reading the docs, or using GoogleFu how to do this properly. I just want to be able to perform some kind of summary logic on records from the DB - how can I do this properly?!?!
Thank you for all your time.
BCBB
something like this should work
# get total user count
total_users = User.objects.count()
# get unique set of users with post
total_users_who_posted = Post.objects.filter(...).distinct("user").count()
# calculate_percentage
percentage = {
"total_participation": (total_users_who_posted*100)/ total_users
}
# take caution of divion by zero
I don't think it is possible to use djangos orm to do this completely but you can use the orm to get the user counts (with posts and total):
from django.db.models import BooleanField, Case, Count, When, Value
counts = (User
.objects
.annotate(posted=Case(When(user_post__isnull=False,
then=Value(True)),
default=Value(False),
output_field=BooleanField()))
.values('posted')
.aggregate(posted_users=Count('pk', filter=Q(posted=True)),
total_users=Count('pk', filter=Q(posted__isnull=False)))
# This will result in a dict containing the following:
# counts = {'posted_users': ...,
# 'total_users': ....}

How to get time slot in django for doctor appointment

I have two models Schedule and Appointment.
How can I get the duration of doctor in a different time slots for 15 minutes. I am getting blank in this
models.
class Schedule(models.Model):
doctor=models.ForeignKey(Doctor)
open=models.TimeField()
close=models.TimeField()
class Appointment(models.Model):
patient=models.ForeignKey(Patient)
doctor=models.ForeignKey(Doctor)
date=models.DateField()
time_slot=models.TimeField()
Based on the discussion we had in the comments, I will not provide you the exact code(as you have not done anything yet). But I will explain you different approaches(I can think right now) you can take.
Scheduler approach
First you can convert the timeslot into numbers, like 10:00 becomes 1, 10:15 becomes 1 and so on until the end time and every-time till the end of time(i.e 6pm in your case), store this as array in the timeslot field. Now every-time someone books a slot, just remove the number from the timeslot. Now if someone tires to book the same time slot you see that this number is not available and you don't let them book it or, every-time the page is reloaded you deactivate the slot for the user. The problem is that everyday you have to restore the array(timeslot) to blank before 10:00.(You might need a scheduler like django-beats).
More Generic way
Here what you do is in the table Appointment, make timeslot a number (it is just a number not array, but numbers follow the same pattern like above 10:00 becomes 1, 10:15 becomes 1 etc). Now everytime you load you page for the first time you query that give me all the appointments with this doctor for the day, initially it will be empty, hence you show all the available time slots. Once some user/patient books a timeslot you just need to create a entry in the Appointment with the patient, doctor, date, timeslot(you can hard code the appointment number on the frontend. like 10 bootstrap cards which show the 15 min timeslot and have different numbers, as you already know timeslot numbers i.e 10:00-> 1, you will receive the timeslot in backend and reserve the slot for the patient.) Now query all the time slots doctor have for the day and don't show the ones which are already booked(appointment table will tell you that).
These are two ways I can think right now, I will add more as I get to realise.
This should give you a direction for now at least.
Ask for details in comments, I will update the answer accordingly.
Here is the solution for that problem.
from django.db import models
class Appointment(models.Model):
"""Contains info about appointment"""
class Meta:
unique_together = ('doctor', 'date', 'timeslot')
TIMESLOT_LIST = (
(0, '09:00 – 09:30'),
(1, '10:00 – 10:30'),
(2, '11:00 – 11:30'),
(3, '12:00 – 12:30'),
(4, '13:00 – 13:30'),
(5, '14:00 – 14:30'),
(6, '15:00 – 15:30'),
(7, '16:00 – 16:30'),
(8, '17:00 – 17:30'),
)
doctor = models.ForeignKey('Doctor',on_delete = models.CASCADE)
date = models.DateField(help_text="YYYY-MM-DD")
timeslot = models.IntegerField(choices=TIMESLOT_LIST)
patient_name = models.CharField(max_length=60)
def __str__(self):
return '{} {} {}. Patient: {}'.format(self.date, self.time, self.doctor, self.patient_name)
#property
def time(self):
return self.TIMESLOT_LIST[self.timeslot][1]
class Doctor(models.Model):
"""Stores info about doctor"""
first_name = models.CharField(max_length=20)
last_name = models.CharField(max_length=20)
middle_name = models.CharField(max_length=20)
specialty = models.CharField(max_length=20)
def __str__(self):
return '{} {}'.format(self.specialty, self.short_name)
#property
def short_name(self):
return '{} {}.{}.'.format(self.last_name.title(), self.first_name[0].upper(), self.middle_name[0].upper())
To get the whole source code go to Click here!

Django ORM: Get all records and the respective last log for each record

I have two models, the simple version would be this:
class Users:
name = models.CharField()
birthdate = models.CharField()
# other fields that play no role in calculations or filters, but I simply need to display
class UserLogs:
user_id = models.ForeignKey(to='Users', related_name='user_daily_logs', on_delete=models.CASCADE)
reference_date = models.DateField()
hours_spent_in_chats = models.DecimalField()
hours_spent_in_p_channels = models.DecimalField()
hours_spent_in_challenges = models.DecimalField()
# other fields that play no role in calculations or filters, but I simply need to display
What I need to write is a query that will return all the fields of all users, with the latest log (reference_date) for each user. So for n users and m logs, the query should return n records. It is guaranteed that each user has at least one log record.
Restrictions:
the query needs to be written in django orm
the query needs to start from the user model. So Anything that goes like Users.objects... is ok. Anything that goes like UserLogs.objects... is not. That's because of filters and logic in the viewset, which is beyond my control
It has to be a single query, and no iterations in python, pandas or itertools are allowed. The Queryset will be directly processed by a serializer.
I shouldn't have to specify the names of the columns that need to be returned, one by one. The query must return all the columns from both models
Attempt no. 1 returns only user id and the log date (for obvious reasons). However, it is the right date, but I just need to get the other columns:
test = User.objects.select_related("user_daily_logs").values("user_daily_logs__user_id").annotate(
max_date=Max("user_daily_logs__reference_date"))
Attempt no. 2 generates as error (Cannot resolve expression type, unknown output_field):
logs = UserLogs.objects.filter(user_id=OuterRef('pk')).order_by('-reference_date')[:1]
users = Users.objects.annotate(latest_log = Subquery(logs))
This seems impossible taking into account all the restrictions.
One approach would be to use prefetch_related
users = User.objects.all().prefetch_related(
models.Prefetch(
'user_daily_logs',
queryset=UserLogs.objects.filter().order_by('-reference_date'),
to_attr="latest_log"
)
)
This will do two db queries and return all logs for every user which may or not be a problem depending on the number of records. If you need only logs for the current day as the name suggest, you can add that to filter and reduce the number of UserLogs records. Of course you need to get the first element from the list.
users.daily_logs[0]
For that you can create a #property on the User model which could look roughly like this
#property
def latest_log(self):
if not hasattr('daily_logs'):
return None
return self.daily_logs[0]
user.latest_log
You can also go a step further and try the following SubQuery inside Prefetch to limit the queryset to one element but I am not sure on the performance with this one (credits Django prefetch_related with limit).
users = User.objects.all().prefetch_related(
models.Prefetch(
'user_daily_logs',
queryset=UserLogs.objects.filter(id__in=Subquery(UserLogs.objects.filter(user_id=OuterRef('user_id')).order_by('-reference_date').values_list('id', flat=True)[:1] ) ),
to_attr="latest_log"
)
)

Query latest votes per user per post in Django

I have a voting system where a user can vote up or down on a post. The votes will be used in a calculation, so I need to store them in a log format, ie, I am saving each vote in it's own table.
Something like this:
class PointLog(models.Model):
post = models.ForeignKey(Post, db_index=True)
points = models.IntegerField(db_index=True)
user = models.ForeignKey(User, blank=True, null=True)
time = models.DateTimeField(auto_now_add=True, db_index=True)
data = models.IntegerField() # -1, 0 or 1
Now I need to display 20 Posts, together with the last vote the user did.
I am using django-rest-framework, so I can use a serializer field that looks like this; uservote = serializers.SerializerMethodField(), together with a function like:
def get_uservote(self, obj):
user = self.context['request'].user
vote = PointLog.objects.only('data').filter(user=user, post=obj).last()
return vote.data if vote else 0
But that will do a db-query 1 time per post, which I hope there is a better solution for.
I could save a db-query for each time the get_uservote is ran by saving the queryset in self.context, so that part is covered.
But how can I do a query that based on a list of items returns all the latest data from another table.
A start would be PointLog.objects.filter(user=user, post__in=posts), but what next? Is this even possible using raw SQL in 1 query?
Update 1:
PointLog.objects.filter(...).order_by('post__id').distinct('post__id') would kinda do it, except that I don't think I am guaranteed to get the newest vote. If I use order_by('pk') (or 'time'), I can't use distinct('post__id') as I will get an sql error (ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions)
I found a solution that only added 1 extra query.
The full get_uservote that works would be;
def get_uservote(self, obj):
user = self.context['request'].user
if user.is_authenticated():
if not self.context.get('get_uservote_data'):
self.context['get_uservote_data'] = {i.post_id: i.data for i in PointLog.objects.filter(user=user, post__in=self.instance).order_by('post__id', '-pk').distinct('post__id')}
return self.context['get_uservote_data'].get(obj.id, 0)
else:
return 0 # Return 0 as "not voted" if not logged in
Note that we are doing a couple of tricks here;
Storing the data in self.context['get_uservote_data'] so we only haveto run the query one time.
Using i.post_id as the key for our data, and not i.post.id. Asking for i.post.id here would trigger 1 query per item.
self.instance is the queryset with all the posts we want to filter on.
This all resulted in a query that looks like this:
SELECT DISTINCT ON ("post_pointlog"."post_id") "post_pointlog"."id",
"post_pointlog"."post_id",
"post_pointlog"."data"
FROM "post_pointlog"
WHERE ( "post_pointlog"."user_id" = 20
AND "post_pointlog"."post_id" IN ( 1, 2, 3, 4 ) )
ORDER BY "post_pointlog"."post_id" ASC,
"post_pointlog"."id" DESC

Django: Distinct on forgin key relationship

I'm working on a Ticket/Issue-tracker in django where I need to log the status of each ticket. This is a simplification of my models.
class Ticket(models.Model):
assigned_to = ForeignKey(User)
comment = models.TextField(_('comment'), blank=True)
created = models.DateTimeField(_("created at"), auto_now_add=True)
class TicketStatus(models.Model):
STATUS_CHOICES = (
(10, _('Open'),),
(20, _('Other'),),
(30, _('Closed'),),
)
ticket = models.ForeignKey(Ticket, verbose_name=_('ticket'))
user = models.ForeignKey(User, verbose_name=_('user'))
status = models.IntegerField(_('status'), choices=STATUS_CHOICES)
date = models.DateTimeField(_("created at"), auto_now_add=True)
Now, getting the status of a ticket is easy sorting by date and retrieving the first column like this.
ticket = Ticket.objects.get(pk=1)
ticket.ticketstatus_set.order_by('-date')[0].get_status_display()
But then I also want to be able to filter on status in the Admin, and those have to get the status trough a Ticket-queryset, which makes it suddenly more complex. How would I get a queryset with all Tickets with a certain status?
I guess you are trying to avoid a cycle (asking for each ticket status) to filter manually the queryset. As far as I know you cannot avoid that cycle. Here are ideas:
# select_related avoids a lot of hits in the database when enter the cycle
t_status = TicketStatus.objects.select_related('Ticket').filter(status = ID_STATUS)
# this is an array with the result
ticket_array = [ts.ticket for ts in tickets_status]
Or, since you mention you were looking for a QuerySet, this might be what you are looking for
# select_related avoids a lot of hits in the database when enter the cycle
t_status = TicketStatus.objects.select_related('Ticket').filter(status = ID_STATUS)
# this is a QuerySet with the result
tickets = Tickets.objects.filter(pk__in = [ts.ticket.pk for ts in t_status])
However, the problem might be in the way you are modeling the data. What you called TickedStatus is more like TicketStatusLog because you want to keep track of the user and date who change the status.
Therefore, the reasonable approach is to add a field 'current_status' to the Ticket model that is updated each time a new TicketStatus is created. In this way (1) you don't have to order a table each time you ask for a ticket and (2) you would simply do something like Ticket.objects.filter(current_status = ID_STATUS) for what I think you are asking.