quizgame randomize getting duplicates in production - django

one way that'd probably work "well enough" is (assuming you can afford to do a count): pick a random indexed column to order by. Order the whole queryset by that. Pick a range between the top and the bottom of the resultset (eg: 1234:1254) and take 1 random result from there. At 50K rows it's ~probably~ a blip in terms of query time (though tbh so might rand() be, at that), however I am trying to translate it into my own codebase as far as I am creating the poc for production code, and I know order_by("?") will kill my db
#api_view(['GET', 'POST'])
def questions_view(request):
if request.method == 'GET':
questions = Question.objects.all().order_by('?').first()
serializer = QuestionListPageSerializer(questions)
return Response(serializer.data)

This may be more cost-efficient:
import random
question = Question.objects.all()[random.randint(0, Question.objects.all().count()-1)]

I think you should keep the questions that users answered and the answer to it. assume that the model is called AnsweredQuestion and it has fk to both user and question. you should just get random questions between the question that is not in relation with user in AnsweredQuestion.

Related

Django - iterating the elements of a dictionary and saving them generates problems at the level of connections to the database?

I'm working with a survey app, so I need to save all the answers a user gives in the database. The way I'm doing it is this:
for key, value in request.POST.items():
if key != 'csrfmiddlewaretoken': # I don't want to save the token info
item = Item.objects.get(pk=key) # I get the question(item) I want to save
if item == None:
return render(request, "survey/error.html")
Answer.objects.create(item= item, answer=value, user = request.user)
Taking into account that django by default closes connections to the database (i.e. does not use persistent connections). My question is:
In case the dictionary has for example the answer to 60 questions (so it will iterate 60 times), would it open and close the connections 60 times, or does it only do it once?
Is there a better way to save POST information manually? (without using django forms, since for various reasons I currently need to do it manually)
This definitely is not a good way to store Answers in bulk, since:
you each time fetch the Item object for every single question;
your code does not handle the case correctly where an item is missing: in that case it will raise an exception, and the Django middleware will (likely) render a 500 page; and
it will make several calls to create all these objects.
We can create objects in bulk to reduce the number of queries. Typically we will create all elements with a single query, although depending on the database and the amount of data, it might take a limited number of queries.
We furtermore do not need to fetch the related Item objects, at all, we can just set the item_id field instead, the "twin" of the item ForeignKey field, like:
from django.db import IntegrityError
try:
answers = [
Answer(item_id=key, answer=value, user=request.user)
for key, value in request.POST.items()
if key != 'csrfmiddlewaretoken'
]
Answer.objects.bulk_create(answers)
except IntegrityError:
return render(request, 'survey/error.html')
The bulk_create will thus insert all the objects in a small number of queries and thus significantly reduce the time of the request.
Note however that bulk_create has some limitations (listed on the documentation page). It might be useful to read those carefully and take them into account. Although I think in the given case, these are not relevant, it is always better to know the limitations of the tools you are using.

Multiple vs one big context_processor

I have multiple context processor and in each I have to request the user. Each of these look like this:
def UploadForm(request):
user = request.user
Uplo = UploadForm(request.POST or None, initial={user})
return {'Uplo': Uplo}
I saw that this is not efficient since im requesting the user multiple times, so I thought about writing one big context processor where I define all the Forms at once.
def AllForms(request):
user = request.user
Uplo = UploadForm(request.POST or None, initial={user...})
SetForm = SetForm(request.POST or None, initial={user...})
...
return {'Uplo': Uplo,'SetForm': SetForm}
Can anybody tell me if I gain here anything? What is the common standard for context processors? I could not find anything on SO.
Getting user from request is not a big thing. It is o(1) operation.
However if the multiple context processors are not doing different thing and can be don at one time, it should be better to create one big context processor as you say it. The reason being you have to get in and out of the function multiple times in same request.
Anyway if you want definitive difference, you can just print time in multiple and clubbed context processors.
And yes, if you are hitting the database every time, you should club them and optimise the number of times you have to hit the db.

Ordering dictionary in context Django

I looked at this thread, with an example of sorting dictionaries.
I have a dictionary of programme objects where the key is a programme object and the value is a lookup of the number of related Project objects.
def DepartmentDetail(request, pk):
department = Department.objects.get(pk=pk)
programmes = Programme.objects.all().filter(department=department).exclude(active=False).order_by('long_name')
combi = {}
for p in programmes:
prj = Project.objects.all().filter(programme=p)
combi[p] = str(len(prj))
return render(request, 'sysadmin/department.html',{'department': department, 'programmes': programmes, 'combi': sorted(combi.items())})
In the model, Programme returns a string 'long_name', so I believe that I am trying to sort a string key and a string value.
In the template I get to the keys and values as so,
{% for programme, n in combi %}
This gives me the error..
unorderable types: Programme() < Programme()
I don't really understand the error, in the python 3 documentation it states that the sorted() method accepts any iterable - So why does this happen?
I'm looking at collections.OrderedDict to solve the problem, but I want to know why this doesn't work.
Thanx.
Databases with index on columns are really good at sorting. There's almost never a need to sort on the client side. You can almost always do it in the server. The funny part it you apparently know how to do it too.
....exclude(active=False).order_by('long_name') # <--- this
Guess what, your data is already sorted there isn't a need to sort it again inside python!!
But there is a much bigger issue in your code. You are fetching a set of Project items and then looping through that set to retrieve them all over again one by one. So if you have 200 Project items you are doing 200 queries when one query does the job just as well. Just add select_related or prefetch_related depending on which direction you have the relationship.
Your code ideally should be something like this
department = Department.objects.get(pk=pk)
programmes = Programme.objects.all().filter(department=department).exclude(active=False).order_by('long_name')
return render(request, 'sysadmin/department.html',{'department': department, 'programmes': programmes,})
As far as I can see combi just contains duplicated data. The same thing can be accessed from programmes eg. programme.project_set.all()
(again this depends on which direction you have the relationship, your models are not shown)
Recommended reading: https://docs.djangoproject.com/en/1.10/ref/models/fields/#django.db.models.ForeignKey
The issue is that sorted expects a way to be able to order the items and by default there isn't any way to know how to order your objects. You can provide a key
sorted(combi.items(), key=lambda i: i.long_name)

I'm confused about how distinct() works with Django queries

I have this query:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location)
Which works great for telling me how many checkins have happened in my date range for a specific location. But I want know how many checkins were done by unique users. So I tried this:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location).values('user').distinct()
But that doesn't work, I get back an empty Array. Any ideas why?
Here is my CheckinAct model:
class CheckinAct(models.Model):
user = models.ForeignKey(User)
location = models.ForeignKey(Location)
time = models.DateTimeField()
----Update------
So now I have updated my query to look like this:
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
But I'm still getting multiple objects back that have the same user, like so:
[{'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
---- Update 2------
Here is something else I tried, but I'm still getting lots of identical user objects back when I log the checkins object.
checkins = CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user')).values('user', 'dcount')
logger.info("checkins!!! : " + str(checkins))
Logs the following:
checkins!!! : [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
Notice how there are 3 instances of the same user object. Is this working correctly or not? Is there a difference way to read out what comes back in the dict object? I just need to know how many unique users check into that specific location during the time range.
The answer is actually right in the Django docs. Unfortunately, very little attention is drawn to the importance of the particular part you need; so it's understandably missed. (Read down a little to the part dealing with Items.)
For your use-case, the following should give you exactly what you want:
checkins = CheckinAct.objects.filter(time__range=[start,end], location=checkin.location).\
values('user').annotate(checkin_count=Count('pk')).order_by()
UPDATE
Based on your comment, I think the issue of what you wanted to achieve has been confused all along. What the query above gives you is a list of the number of times each user checked in at a location, without duplicate users in said list. It now seems what you really wanted was the number of unique users that checked in at one particular location. To get that, use the following (which is much simpler anyways):
User.objects.filter(checkinat__location=location).distinct().count()
UPDATE for non-rel support
checkin_users = [(c.user.pk, c.user) for c in CheckinAct.objects.filter(location=location)]
unique_checkins = len(dict(checkin_users))
This works off the principle that dicts have unique keys. So when you convert the list of tuples to a dict, you end up with a list of unique users. But, this will generate 1*N queries, where N is the total amount of checkins (one query each time the user attribute is used. Normally, I'd do something like .select_related('user'), but that too requires a JOIN, which is apparently out. JOINs not being supported seems like a huge downside to non-rel, if true, but if that's the case this is going to be your only option.
You don't want DISTINCT. You actually want Django to do something that will end up giving you a GROUP BY clause. You are also correct that your final solution is to combine annotate() and values(), as discussed in the Django documentation.
What you want to do to get your results is to use annotate first, and then values, such as:
CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user').values('user', 'dcount')
The Django docs at the link I gave you above show a similarly constructed query (minus the filter aspect, which I added for your case in the proper location), and note that this will "now yield one unique result for each [checkin act]; however, only the [user] and the [dcount] annotation will be returned in the output data". (I edited the sentence to fit your case, but the principle is the same).
Hope that helps!
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
If I am not mistaken, wouldn't the value you want be in the input as "dcount"? As a result, isn't that just being discarded when you decide to output the user value alone?
Can you tell me what happens when you try this?
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(Count('user')).order_by()
(The last order_by is to clear any built-in ordering that you may already have at the model level - not sure if you have anything like that, but doesn't hurt to ask...)

Django most efficient way to do this?

I have developed a few Django apps, all pretty straight-forward in terms of how I am interacting with the models.
I am building one now that has several different views which, for lack of a better term, are "canned" search result pages. These pages all return results from the same model, but they are filtered on different columns. One page we might be filtering on type, another we might be filtering on type and size, and on yet another we may be filtering on size only, etc...
I have written a function in views.py which is used by each of these pages, it takes a kwargs and in that are the criteria upon which to search. The minimum is one filter but one of the views has up to 4.
I am simply seeing if the kwargs dict contains one of the filter types, if so I filter the result on that value (I just wrote this code now, I apologize if any errors, but you should get the point):
def get_search_object(**kwargs):
q = Entry.objects.all()
if kwargs.__contains__('the_key1'):
q = q.filter(column1=kwargs['the_key1'])
if kwargs.__contains__('the_key2'):
q = q.filter(column2=kwargs['the_key2'])
return q.distinct()
Now, according to the django docs (http://docs.djangoproject.com/en/dev/topics/db/queries/#id3), these is fine, in that the DB will not be hit until the set is evaluated, lately though I have heard that this is not the most efficient way to do it and one should probably use Q objects instead.
I guess I am looking for an answer from other developers out there. My way currently works fine, if my way is totally wrong from a resources POV, then I will change ASAP.
Thanks in advance
Resource-wise, you're fine, but there are a lot of ways it can be stylistically improved to avoid using the double-underscore methods and to make it more flexible and easier to maintain.
If the kwargs being used are the actual column names then you should be able to pretty easily simplify it since what you're kind of doing is deconstructing the kwargs and rebuilding it manually but for only specific keywords.
def get_search_object(**kwargs):
entries = Entry.objects.filter(**kwargs)
return entries.distinct()
The main difference there is that it doesn't enforce that the keys be actual columns and pretty badly needs some exception handling in there. If you want to restrict it to a specific set of fields, you can specify that list and then build up a dict with the valid entries.
def get_search_object(**kwargs):
valid_fields = ['the_key1', 'the_key2']
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
If you want a fancier solution that just checks that it's a valid field on that model, you can (ab)use _meta:
def get_search_object(**kwargs):
valid_fields = [field.name for field in Entry._meta.fields]
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
In this case, your usage is fine from an efficiency standpoint. You would only need to use Q objects if you needed to OR your filters instead of AND.