Multiple vs one big context_processor - django

I have multiple context processor and in each I have to request the user. Each of these look like this:
def UploadForm(request):
user = request.user
Uplo = UploadForm(request.POST or None, initial={user})
return {'Uplo': Uplo}
I saw that this is not efficient since im requesting the user multiple times, so I thought about writing one big context processor where I define all the Forms at once.
def AllForms(request):
user = request.user
Uplo = UploadForm(request.POST or None, initial={user...})
SetForm = SetForm(request.POST or None, initial={user...})
...
return {'Uplo': Uplo,'SetForm': SetForm}
Can anybody tell me if I gain here anything? What is the common standard for context processors? I could not find anything on SO.

Getting user from request is not a big thing. It is o(1) operation.
However if the multiple context processors are not doing different thing and can be don at one time, it should be better to create one big context processor as you say it. The reason being you have to get in and out of the function multiple times in same request.
Anyway if you want definitive difference, you can just print time in multiple and clubbed context processors.
And yes, if you are hitting the database every time, you should club them and optimise the number of times you have to hit the db.

Related

Duplicate data occurs when order_by ('?') and pagination is applied

#action(detail=False, methods=["get"])
def home_list(self, request):
data = extra_models.objects.order_by("?")
print(data)
paginator = self.paginator
results = paginator.paginate_queryset(data, request)
serializer = self.get_serializer(results, many=True)
return self.get_paginated_response(serializer.data)
What I want to do is, I want the data of extra_models (objects) to come out randomly without duplication every time the home_list API is called.
However, I want to come out randomly but cut it out in 10 units. (settings.py pagination option applied)
The current problem is that the first 10 appear randomly, but when the next 10 appear, the first ones are also mixed.
In other words, data is being duplicated.
Duplicates do not occur within the same page.
If you move to the next page, data from the previous page is mixed.
Even if you try print(data) or print(serializer.data) in the middle, duplicate data is not delivered.
However, data duplication occurs from /home_list?page=2 when calling the actual API.
Which part should I check?
You should expect this behaviour when you're dealing with .order_by("?").
Whenever a request hits in server's end, Django shuffles the objects and also Django doesn't preserve the previous request or page
You are doing nothing wrong here, The only reason why this is happening is because of order_by("?"). The API is stateless which means on the second API call when you call for Page=2 then it does not know which data is sent for page=1 and returns random data for page=2.
The only solution is to order your data by ASC or DESC

quizgame randomize getting duplicates in production

one way that'd probably work "well enough" is (assuming you can afford to do a count): pick a random indexed column to order by. Order the whole queryset by that. Pick a range between the top and the bottom of the resultset (eg: 1234:1254) and take 1 random result from there. At 50K rows it's ~probably~ a blip in terms of query time (though tbh so might rand() be, at that), however I am trying to translate it into my own codebase as far as I am creating the poc for production code, and I know order_by("?") will kill my db
#api_view(['GET', 'POST'])
def questions_view(request):
if request.method == 'GET':
questions = Question.objects.all().order_by('?').first()
serializer = QuestionListPageSerializer(questions)
return Response(serializer.data)
This may be more cost-efficient:
import random
question = Question.objects.all()[random.randint(0, Question.objects.all().count()-1)]
I think you should keep the questions that users answered and the answer to it. assume that the model is called AnsweredQuestion and it has fk to both user and question. you should just get random questions between the question that is not in relation with user in AnsweredQuestion.

intelligent methodology for filtering server side

My lack of CS and inexperience is really coming to the forefront in this moment. I've never really handled filtering results server side. I'm thinking that this is not the right way to go about it. I'm using Django....
First, I assume that I can keep it DRYer by keeping this validation in my form definitions. Next, I was concerned about my chained filter statements. How important is it to use Q complex lookups as opposed to chaining filters at this point? I'm just building a prototype and I assume that I'll eventually have to go for a search solution more powerful than full text search.
My big issue right now (besides the length of the code and clearly the inefficiency) is that I'm not sure how to handle my rooms and workers inputs, which are select forms. If the user does not select a value, I want to remove these filters from the process server side. Should I just create two separate conditional series of lookups for these outcomes?
def search(request):
if request.method=='GET' and request.GET.get('region',''):
neighborhoods=request.GET.getlist('region')
min_rent=request.GET.get('min_cost','0')
min_rent=re.sub(r'[,]','',min_cost) #remove any ','
if re.search(r'[^\d]',min_cost):
min_cost=0
else:
min_cost=int(min_cost)
max_cost=request.GET.get('max_cost','0')
max_cost=re.sub(r'[,]','',max_cost) #remove any ','
if re.search(r'[^\d]',max_cost):
max_cost=100000
else:
max_cost=int(max_rent)
date_min=request.GET.get('from','')
date_max=request.GET.get('to','')
if not date_min:
date=(str(datetime.date.today()))
date_min=u'%s' %date
if not date_max:
date_max=u'2013-03-18'
rooms=request.GET.get('rooms',0)
if not rooms:
rooms=0
workers=request.GET.get('workers',0)
if not workers:
workers=0
#I should probably use Q objects here for complex lookups
posts=Post.objects.filter(region__in=region).filter(cost__gt=min_cost).filter(cost__lt=max_cost).filter(availability__gt=date_min).filter(availability__lt=date_max).filter(rooms=rooms).filter(workers=workers)
#return HttpResponse('%s' %posts)
return render_to_response("website/search.html",{'posts':posts),context_instance=RequestContext(request))
First, I assume that I can keep it
DRYer by keeping this validation in my
form definitions.
Yes, I'd put this in a form as it looks like you are using one to display the form anyways? Also, you can put a lot of your date formatting stuff right in the clean_FIELD methods to format the data in the cleaned_data dict. The only issue here is that output is actually modified so your users will see the change from 1,000 to 1000. Either way, I would put this logic in a form method.
# makes the view clean.
if form.is_valid():
form.get_posts(request)
return response
My big issue right now (besides the
length of the code and clearly the
inefficiency) is that I'm not sure how
to handle my rooms and workers inputs,
which are select forms. If the user
does not select a value, I want to
remove these filters from the process
server side. Should I just create two
separate conditional series of lookups
for these outcomes?
Q objects are only for complex lookups. I don't see a need for them here.
I also don't see why you need to chain the filters. I at first wondered if these are m2m, but these types of queries (__gt/__lt) don't behave any differently chaining as there is no overlap between the queries.
# this is more readable / concise.
# I'd combine as many of your queries as you can just for readability.
posts = Posts.objects.filter(
region__in=region,
cost__gte=min_cost,
# etc
)
Now, if you want optional arguments, my suggestion is to use a dictionary of keyword arguments so that you can dynamically populate the kwargs.
keyword_arguments = {
'region__in': region,
'cost__gte': min_cost,
'cost__lt': max_cost,
'availability__gt': date_min,
'availability__lt': date_max,
}
if request.GET.get('rooms'):
keyword_arguments['rooms'] = request.GET['rooms']
if request.GET.get('workers'):
keyword_arguments['workers'] = request.GET['workers']
posts = Posts.objects.filter(**keyword_arguments)

Django most efficient way to do this?

I have developed a few Django apps, all pretty straight-forward in terms of how I am interacting with the models.
I am building one now that has several different views which, for lack of a better term, are "canned" search result pages. These pages all return results from the same model, but they are filtered on different columns. One page we might be filtering on type, another we might be filtering on type and size, and on yet another we may be filtering on size only, etc...
I have written a function in views.py which is used by each of these pages, it takes a kwargs and in that are the criteria upon which to search. The minimum is one filter but one of the views has up to 4.
I am simply seeing if the kwargs dict contains one of the filter types, if so I filter the result on that value (I just wrote this code now, I apologize if any errors, but you should get the point):
def get_search_object(**kwargs):
q = Entry.objects.all()
if kwargs.__contains__('the_key1'):
q = q.filter(column1=kwargs['the_key1'])
if kwargs.__contains__('the_key2'):
q = q.filter(column2=kwargs['the_key2'])
return q.distinct()
Now, according to the django docs (http://docs.djangoproject.com/en/dev/topics/db/queries/#id3), these is fine, in that the DB will not be hit until the set is evaluated, lately though I have heard that this is not the most efficient way to do it and one should probably use Q objects instead.
I guess I am looking for an answer from other developers out there. My way currently works fine, if my way is totally wrong from a resources POV, then I will change ASAP.
Thanks in advance
Resource-wise, you're fine, but there are a lot of ways it can be stylistically improved to avoid using the double-underscore methods and to make it more flexible and easier to maintain.
If the kwargs being used are the actual column names then you should be able to pretty easily simplify it since what you're kind of doing is deconstructing the kwargs and rebuilding it manually but for only specific keywords.
def get_search_object(**kwargs):
entries = Entry.objects.filter(**kwargs)
return entries.distinct()
The main difference there is that it doesn't enforce that the keys be actual columns and pretty badly needs some exception handling in there. If you want to restrict it to a specific set of fields, you can specify that list and then build up a dict with the valid entries.
def get_search_object(**kwargs):
valid_fields = ['the_key1', 'the_key2']
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
If you want a fancier solution that just checks that it's a valid field on that model, you can (ab)use _meta:
def get_search_object(**kwargs):
valid_fields = [field.name for field in Entry._meta.fields]
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
In this case, your usage is fine from an efficiency standpoint. You would only need to use Q objects if you needed to OR your filters instead of AND.

Checking for empty queryset in Django

What is the recommended idiom for checking whether a query returned any results?
Example:
orgs = Organisation.objects.filter(name__iexact = 'Fjuk inc')
# If any results
# Do this with the results without querying again.
# Else, do something else...
I suppose there are several different ways of checking this, but I'd like to know how an experienced Django user would do it.
Most examples in the docs just ignore the case where nothing was found...
if not orgs:
# The Queryset is empty ...
else:
# The Queryset has results ...
Since version 1.2, Django has QuerySet.exists() method which is the most efficient:
if orgs.exists():
# Do this...
else:
# Do that...
But if you are going to evaluate QuerySet anyway it's better to use:
if orgs:
...
For more information read QuerySet.exists() documentation.
To check the emptiness of a queryset:
if orgs.exists():
# Do something
or you can check for a the first item in a queryset, if it doesn't exist it will return None:
if orgs.first():
# Do something
If you have a huge number of objects, this can (at times) be much faster:
try:
orgs[0]
# If you get here, it exists...
except IndexError:
# Doesn't exist!
On a project I'm working on with a huge database, not orgs is 400+ ms and orgs.count() is 250ms. In my most common use cases (those where there are results), this technique often gets that down to under 20ms. (One case I found, it was 6.)
Could be much longer, of course, depending on how far the database has to look to find a result. Or even faster, if it finds one quickly; YMMV.
EDIT: This will often be slower than orgs.count() if the result isn't found, particularly if the condition you're filtering on is a rare one; as a result, it's particularly useful in view functions where you need to make sure the view exists or throw Http404. (Where, one would hope, people are asking for URLs that exist more often than not.)
The most efficient way (before django 1.2) is this:
if orgs.count() == 0:
# no results
else:
# alrigh! let's continue...
I disagree with the predicate
if not orgs:
It should be
if not orgs.count():
I was having the same issue with a fairly large result set (~150k results). The operator is not overloaded in QuerySet, so the result is actually unpacked as a list before the check is made. In my case execution time went down by three orders.
You could also use this:
if(not(orgs)):
#if orgs is empty
else:
#if orgs is not empty