Query latest votes per user per post in Django - django

I have a voting system where a user can vote up or down on a post. The votes will be used in a calculation, so I need to store them in a log format, ie, I am saving each vote in it's own table.
Something like this:
class PointLog(models.Model):
post = models.ForeignKey(Post, db_index=True)
points = models.IntegerField(db_index=True)
user = models.ForeignKey(User, blank=True, null=True)
time = models.DateTimeField(auto_now_add=True, db_index=True)
data = models.IntegerField() # -1, 0 or 1
Now I need to display 20 Posts, together with the last vote the user did.
I am using django-rest-framework, so I can use a serializer field that looks like this; uservote = serializers.SerializerMethodField(), together with a function like:
def get_uservote(self, obj):
user = self.context['request'].user
vote = PointLog.objects.only('data').filter(user=user, post=obj).last()
return vote.data if vote else 0
But that will do a db-query 1 time per post, which I hope there is a better solution for.
I could save a db-query for each time the get_uservote is ran by saving the queryset in self.context, so that part is covered.
But how can I do a query that based on a list of items returns all the latest data from another table.
A start would be PointLog.objects.filter(user=user, post__in=posts), but what next? Is this even possible using raw SQL in 1 query?
Update 1:
PointLog.objects.filter(...).order_by('post__id').distinct('post__id') would kinda do it, except that I don't think I am guaranteed to get the newest vote. If I use order_by('pk') (or 'time'), I can't use distinct('post__id') as I will get an sql error (ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions)

I found a solution that only added 1 extra query.
The full get_uservote that works would be;
def get_uservote(self, obj):
user = self.context['request'].user
if user.is_authenticated():
if not self.context.get('get_uservote_data'):
self.context['get_uservote_data'] = {i.post_id: i.data for i in PointLog.objects.filter(user=user, post__in=self.instance).order_by('post__id', '-pk').distinct('post__id')}
return self.context['get_uservote_data'].get(obj.id, 0)
else:
return 0 # Return 0 as "not voted" if not logged in
Note that we are doing a couple of tricks here;
Storing the data in self.context['get_uservote_data'] so we only haveto run the query one time.
Using i.post_id as the key for our data, and not i.post.id. Asking for i.post.id here would trigger 1 query per item.
self.instance is the queryset with all the posts we want to filter on.
This all resulted in a query that looks like this:
SELECT DISTINCT ON ("post_pointlog"."post_id") "post_pointlog"."id",
"post_pointlog"."post_id",
"post_pointlog"."data"
FROM "post_pointlog"
WHERE ( "post_pointlog"."user_id" = 20
AND "post_pointlog"."post_id" IN ( 1, 2, 3, 4 ) )
ORDER BY "post_pointlog"."post_id" ASC,
"post_pointlog"."id" DESC

Related

Django query to return percentage of a users with a post

Two models Users (built-in) and Posts:
class Post(models.Model):
post_date = models.DateTimeField(default=timezone.now)
user = models.ForeignKey(User, on_delete=models.CASCADE, null=True, related_name='user_post')
post = models.CharField(max_length=100)
I want to have an API endpoint that returns the percentage of users that have posted. Basically I want SUM(unique users who have posted) / total_users
I have been trying to play around with annotate and aggregate, but I am getting the sum of posts for each users, or the sum of users per post (which is one...). How can I get the sum of posts returned with unique users, divide that by user.count and return?
I feel like I am missing something silly but my brain has gone to mush staring at this.
class PostParticipationAPIView(generics.ListAPIView):
queryset = Post.objects.all()
serializer_class = PostSerializer
def get_queryset(self):
start_date = self.request.query_params.get('start_date')
end_date = self.request.query_params.get('end_date')
# How can I take something like this, divide it by User.objects.all().count() * 100, and assign it to something to return as the queryset?
queryset = Post.objects.filter(post_date__gte=start_date, post_date__lte=end_date).distinct('user').count()
return queryset
My goal is to end up with the endpoint like:
{
total_participation: 97.3
}
Thanks for any guidance.
BCBB
EDIT
OK, I am still struggling a bit. I tried to create a serializer that just had a decimal field for participation_percentage like:
percentage_participation = serializers.DecimalField(max_digits=5, decimal_places=2, max_value=100, min_value=0)
Then I calculate in the view, but I get an error:
Got AttributeError when attempting to get a value for field percentage_participation on serializer ParticipationSerializer.
The serializer field might be named incorrectly and not match any attribute or key on the str instance.
Original exception text was: 'str' object has no attribute 'percentage_participation'.
Error was the same if I made it a CharField (in case there was some string coercion?).
So then I tried to move it to a Serializer Method and put all the calculation logic in there. This calculated fine, but if I had to provide a query_set in the view. If provided a model object, it just returned the percentage as many times as the query (say Posts.objects.all() had a total of 100 posts, it returned the percentage 100 times).
So then I tried to override the get_queryset in the view, but I HAVE to return something. If I just return { "meh", "hello" } then I return the percentage from the SerializerMethodField one time and the end result is exactly what I want.
I just have no idea as to WHY or how to do this correctly.
Thanks for your help.
EDIT #2
OK so I realized why I was only getting one, it was iterating over the string I returned, which was one character. When I returned "meh" it gave me three of the percentage, iterating over each character in the string...
I am not understanding from playing around, reading the docs, or using GoogleFu how to do this properly. I just want to be able to perform some kind of summary logic on records from the DB - how can I do this properly?!?!
Thank you for all your time.
BCBB
something like this should work
# get total user count
total_users = User.objects.count()
# get unique set of users with post
total_users_who_posted = Post.objects.filter(...).distinct("user").count()
# calculate_percentage
percentage = {
"total_participation": (total_users_who_posted*100)/ total_users
}
# take caution of divion by zero
I don't think it is possible to use djangos orm to do this completely but you can use the orm to get the user counts (with posts and total):
from django.db.models import BooleanField, Case, Count, When, Value
counts = (User
.objects
.annotate(posted=Case(When(user_post__isnull=False,
then=Value(True)),
default=Value(False),
output_field=BooleanField()))
.values('posted')
.aggregate(posted_users=Count('pk', filter=Q(posted=True)),
total_users=Count('pk', filter=Q(posted__isnull=False)))
# This will result in a dict containing the following:
# counts = {'posted_users': ...,
# 'total_users': ....}

cross instance calculations in django queryset

Not sure that it's (a) doable and (b) if I formulate the task correctly. Perhaps the right way is refactoring the db design, but I would appreciate any opinion on that.
I have a model in django app, where I track the times a user enters and exits a certain page (via either form submission or just closing the broswer window). I do tracking using django channels, but it does not matter in this case.
The model looks like:
class TimeStamp(models.Model):
class Meta:
get_latest_by = 'enter_time'
page_name = models.CharField(max_length=1000)
participant = models.ForeignKey(to=Participant, related_name='timestamps')
timestamp = models.DateTimeField()
enter_exit_type = models.CharField(max_length=1000, choices=ENTEREXITTYPES,)
What I need to do is to calculate how much time a user spends on this page. So I need to loop through all records of Timestamp for the specific user, and calculate time difference between records of 'enter' and 'exit' types records.
So the db data may look like:
id timestamp enter_exit_type
1 20:12:12 enter
2 20:12:13 exit
3 20:18:12 enter
4 20:21:12 exit
5 20:41:12 enter
so what is the right way to produce a resulting queryset that look like:
id time_spent_sec
1 0:01
2 3:00
The last 'enter' record is ignored because there is no corresponding 'exit' record.
The record 1 in resulting queryset is difference between timestamps in ids 2 and 1. The record 2 in resulting queryset is difference between timestamps in ids 4 and 3.
I can just loop through the records, looking for the nearest 'exit' record and calculate it but I was thinking if there is a simpler solution?
It's possible:
1) Use the approach here to group by user if you want to get answer for all users in one query.
2) filter out the last unclosed entry with enter_exit_type == 'enter'.
3) .annotate(timestamp_with_sign=Case(When(enter_exit_type='exit', then=F('timestamp') * -1), default=F('timestamp'), )
4) Sum() by the timestamp_with_sign field.
I'm not sure, that F('timestamp') would work, you may need to search for the way to convert it to unix time.
This model structure may not be sufficient for your requirement. So I would suggest to change your model as,
class TimeStamp(models.Model):
class Meta:
get_latest_by = 'enter_time'
page_name = models.CharField(max_length=1000)
participant = models.ForeignKey(Musician, related_name='timestamps')
enter = models.DateTimeField()
exit = models.DateTimeField(null=True, blank=True)
Then you will get the data as,
from django.db.models import F, Sum, ExpressionWrapper, DurationField
TimeStamp.objects.values(
'participant').annotate(
sum_time_diff=Sum(
ExpressionWrapper(F('exit') - F('enter'), output_field=DurationField())
)
)
The response be like,
<QuerySet [{'participant': 1, 'sum_time_diff': datetime.timedelta(0, 7)}, {'participant': 2, 'sum_time_diff': datetime.timedelta(0, 2)}]>

django querset filter foreign key select first record

I have a History model like below
class History(models.Model):
class Meta:
app_label = 'subscription'
ordering = ['-start_datetime']
subscription = models.ForeignKey(Subscription, related_name='history')
FREE = 'free'
Premium = 'premium'
SUBSCRIPTION_TYPE_CHOICES = ((FREE, 'Free'), (Premium, 'Premium'),)
name = models.CharField(max_length=32, choices=SUBSCRIPTION_TYPE_CHOICES, default=FREE)
start_datetime = models.DateTimeField(db_index=True)
end_datetime = models.DateTimeField(db_index=True, blank=True, null=True)
cancelled_datetime = models.DateTimeField(blank=True, null=True)
Now i have a queryset filtering like below
users = get_user_model().objects.all()
queryset = users.exclude(subscription__history__end_datetime__lt=timezone.now())
The issue is that in the exclude above it is checking end_datetime for all the rows for a particular history object. But i only want to compare it with first row of history object.
Below is how a particular history object looks like. So i want to write a queryset filter which can do datetime comparison on first row only.
You could use a Model Manager method for this. The documentation isn't all that descriptive, but you could do something along the lines of:
class SubscriptionManager(models.Manager):
def my_filter(self):
# You'd want to make this a smaller query most likely
subscriptions = Subscription.objects.all()
results = []
for subscription in subscriptions:
sub_history = subscription.history_set.first()
if sub_history.end_datetime > timezone.now:
results.append(subscription)
return results
class History(models.Model):
subscription = models.ForeignKey(Subscription)
end_datetime = models.DateTimeField(db_index=True, blank=True, null=True)
objects = SubscriptionManager()
Then: queryset = Subscription.objects().my_filter()
Not a copy-pastable answer, but shows the use of Managers. Given the specificity of what you're looking for, I don't think there's a way to get it just via the plain filter() and exclude().
Without knowing what your end goal here is, it's hard to say whether this is feasible, but have you considered adding a property to the subscription model that indicates whatever you're looking for? For example, if you're trying to get everyone who has a subscription that's ending:
class Subscription(models.Model):
#property
def ending(self):
if self.end_datetime > timezone.now:
return True
else:
return False
Then in your code: queryset = users.filter(subscription_ending=True)
I have tried django's all king of expressions(aggregate, query, conditional) but was unable to solve the problem so i went with RawSQL and it solved the problem.
I have used the below SQL to select the first row and then compare the end_datetime
SELECT (end_datetime > %s OR end_datetime IS NULL) AS result
FROM subscription_history
ORDER BY start_datetime DESC
LIMIT 1;
I will select my answer as accepted if not found a solution with queryset filter chaining in next 2 days.

Django: Distinct on forgin key relationship

I'm working on a Ticket/Issue-tracker in django where I need to log the status of each ticket. This is a simplification of my models.
class Ticket(models.Model):
assigned_to = ForeignKey(User)
comment = models.TextField(_('comment'), blank=True)
created = models.DateTimeField(_("created at"), auto_now_add=True)
class TicketStatus(models.Model):
STATUS_CHOICES = (
(10, _('Open'),),
(20, _('Other'),),
(30, _('Closed'),),
)
ticket = models.ForeignKey(Ticket, verbose_name=_('ticket'))
user = models.ForeignKey(User, verbose_name=_('user'))
status = models.IntegerField(_('status'), choices=STATUS_CHOICES)
date = models.DateTimeField(_("created at"), auto_now_add=True)
Now, getting the status of a ticket is easy sorting by date and retrieving the first column like this.
ticket = Ticket.objects.get(pk=1)
ticket.ticketstatus_set.order_by('-date')[0].get_status_display()
But then I also want to be able to filter on status in the Admin, and those have to get the status trough a Ticket-queryset, which makes it suddenly more complex. How would I get a queryset with all Tickets with a certain status?
I guess you are trying to avoid a cycle (asking for each ticket status) to filter manually the queryset. As far as I know you cannot avoid that cycle. Here are ideas:
# select_related avoids a lot of hits in the database when enter the cycle
t_status = TicketStatus.objects.select_related('Ticket').filter(status = ID_STATUS)
# this is an array with the result
ticket_array = [ts.ticket for ts in tickets_status]
Or, since you mention you were looking for a QuerySet, this might be what you are looking for
# select_related avoids a lot of hits in the database when enter the cycle
t_status = TicketStatus.objects.select_related('Ticket').filter(status = ID_STATUS)
# this is a QuerySet with the result
tickets = Tickets.objects.filter(pk__in = [ts.ticket.pk for ts in t_status])
However, the problem might be in the way you are modeling the data. What you called TickedStatus is more like TicketStatusLog because you want to keep track of the user and date who change the status.
Therefore, the reasonable approach is to add a field 'current_status' to the Ticket model that is updated each time a new TicketStatus is created. In this way (1) you don't have to order a table each time you ask for a ticket and (2) you would simply do something like Ticket.objects.filter(current_status = ID_STATUS) for what I think you are asking.

Django model manager live-object issues

I am using Django. I am having a few issues with caching of QuerySets for news/category models:
class Category(models.Model):
title = models.CharField(max_length=60)
slug = models.SlugField(unique=True)
class PublishedArticlesManager(models.Manager):
def get_query_set(self):
return super(PublishedArticlesManager, self).get_query_set() \
.filter(published__lte=datetime.datetime.now())
class Article(models.Model):
category = models.ForeignKey(Category)
title = models.CharField(max_length=60)
slug = models.SlugField(unique = True)
story = models.TextField()
author = models.CharField(max_length=60, blank=True)
published = models.DateTimeField(
help_text=_('Set to a date in the future to publish later.'))
created = models.DateTimeField(auto_now_add=True, editable=False)
updated = models.DateTimeField(auto_now=True, editable=False)
live = PublishedArticlesManager()
objects = models.Manager()
Note - I have removed some fields to save on complexity...
There are a few (related) issues with the above.
Firstly, when I query for LIVE objects in my view via Article.live.all() if I refresh the page repeatedly I can see (in MYSQL logs) the same database query being made with exactly the same date in the where clause - ie - the datetime.datetime.now() is being evaluated at compile time rather than runtime. I need the date to be evaluated at runtime.
Secondly, when I use the articles_set method on the Category object this appears to work correctly - the datetime used in the query changes each time the query is run - again I can see this in the logs. However, I am not quite sure why this works, since I don't have anything in my code to say that the articles_set query should return LIVE entries only!?
Finally, why is none of this being cached?
Any ideas how to make the correct time be used consistently? Can someone please explain why the latter setup appears to work?
Thanks
Jay
P.S - database queries below, note the date variations.
SELECT LIVE ARTICLES, query #1:
SELECT `news_article`.`id`, `news_article`.`category_id`, `news_article`.`title`, `news_article`.`slug`, `news_article`.`teaser`, `news_article`.`summary`, `news_article`.`story`, `news_article`.`author`, `news_article`.`published`, `news_article`.`created`, `news_article`.`updated` FROM `news_article` WHERE `news_article`.`published` <= '2011-05-17 21:55:41' ORDER BY `news_article`.`published` DESC, `news_article`.`slug` ASC;
SELECT LIVE ARTICLES, query #1:
SELECT `news_article`.`id`, `news_article`.`category_id`, `news_article`.`title`, `news_article`.`slug`, `news_article`.`teaser`, `news_article`.`summary`, `news_article`.`story`, `news_article`.`author`, `news_article`.`published`, `news_article`.`created`, `news_article`.`updated` FROM `news_article` WHERE `news_article`.`published` <= '2011-05-17 21:55:41' ORDER BY `news_article`.`published` DESC, `news_article`.`slug` ASC;
CATEGORY SELECT ARTICLES, query #1:
SELECT `news_article`.`id`, `news_article`.`category_id`, `news_article`.`title`, `news_article`.`slug`, `news_article`.`teaser`, `news_article`.`summary`, `news_article`.`story`, `news_article`.`author`, `news_article`.`published`, `news_article`.`created`, `news_article`.`updated` FROM `news_article` WHERE (`news_article`.`published` <= '2011-05-18 21:21:33' AND `news_article`.`category_id` = 1 ) ORDER BY `news_article`.`published` DESC, `news_article`.`slug` ASC;
CATEGORY SELECT ARTICLES, query #1:
SELECT `news_article`.`id`, `news_article`.`category_id`, `news_article`.`title`, `news_article`.`slug`, `news_article`.`teaser`, `news_article`.`summary`, `news_article`.`story`, `news_article`.`author`, `news_article`.`published`, `news_article`.`created`, `news_article`.`updated` FROM `news_article` WHERE (`news_article`.`published` <= '2011-05-18 21:26:06' AND `news_article`.`category_id` = 1 ) ORDER BY `news_article`.`published` DESC, `news_article`.`slug` ASC;
You should check out conditional view processing.
def latest_entry(request, article_id):
return Article.objects.latest("updated").updated
#conditional(last_modified_func=latest_entry)
def view_article(request, article_id)
your view code here
This should cache the page rather than reloading a new version every time.
I suspect that if you want the now() to be processed at runtime, you should do use raw sql. I think this will solve the compile/runtime issue.
class PublishedArticlesManager(models.Manager):
def get_query_set(self):
return super(PublishedArticlesManager, self).get_query_set() \
.raw("SELECT * FROM news_article WHERE published <= CURRENT_TIMESTAMP")
Note that this returns a RawQuerySet which may differ a bit from a normal QuerySet
I have now fixed this issue. It appears the problem was that the queryset returned by Article.live.all() was being cached in my urls.py! I was using function-based generic-views:
url(r'^all/$', object_list, {
'queryset' : Article.live.all(),
}, 'news_all'),
I have now changed this to use the class-based approach, as advised in the latest Django documentation:
url(r'^all/$', ListView.as_view(
model=Article,
), name="news_all"),
This now works as expected - by specifying the model attribute rather than the queryset attribute the query is QuerySet is created at compile-time instead of runtime.