Duplicate data occurs when order_by ('?') and pagination is applied

Duplicate data occurs when order_by ('?') and pagination is applied - django

#action(detail=False, methods=["get"])
def home_list(self, request):
data = extra_models.objects.order_by("?")
print(data)
paginator = self.paginator
results = paginator.paginate_queryset(data, request)
serializer = self.get_serializer(results, many=True)
return self.get_paginated_response(serializer.data)
What I want to do is, I want the data of extra_models (objects) to come out randomly without duplication every time the home_list API is called.
However, I want to come out randomly but cut it out in 10 units. (settings.py pagination option applied)
The current problem is that the first 10 appear randomly, but when the next 10 appear, the first ones are also mixed.
In other words, data is being duplicated.
Duplicates do not occur within the same page.
If you move to the next page, data from the previous page is mixed.
Even if you try print(data) or print(serializer.data) in the middle, duplicate data is not delivered.
However, data duplication occurs from /home_list?page=2 when calling the actual API.
Which part should I check?

You should expect this behaviour when you're dealing with .order_by("?").
Whenever a request hits in server's end, Django shuffles the objects and also Django doesn't preserve the previous request or page

You are doing nothing wrong here, The only reason why this is happening is because of order_by("?"). The API is stateless which means on the second API call when you call for Page=2 then it does not know which data is sent for page=1 and returns random data for page=2.
The only solution is to order your data by ASC or DESC

Related

Django Forms and query limit of ModelChoiceField queryset

Creating a form inheriting the "forms.ModelForm" class:
number = forms.ModelChoiceField(queryset=Number.objects.order_by('?')[:6], empty_label=None)
The result is a Choice form is created, and random numbers limited to 6 entries will appear. But data indicating the limit is not accepted.
if you make the code like this without a limit:
number = forms.ModelChoiceField(queryset=Number.objects.order_by('?'), empty_label=None)
Then all records appear in the form, and the form is validated everything is fine.
P.S
SELECT "number"."number" FROM "number" ORDER BY RAND() ASC LIMIT 6
When requesting a limit, the log shows that it works perfectly with LIMIT
I need help please

For some reason, the form makes 2 requests to the database, and validation fails because the server has a set of the 1st request and the client is rendered from the 2nd request.
Solved the issue by using the object call function.

Django - iterating the elements of a dictionary and saving them generates problems at the level of connections to the database?

I'm working with a survey app, so I need to save all the answers a user gives in the database. The way I'm doing it is this:
for key, value in request.POST.items():
if key != 'csrfmiddlewaretoken': # I don't want to save the token info
item = Item.objects.get(pk=key) # I get the question(item) I want to save
if item == None:
return render(request, "survey/error.html")
Answer.objects.create(item= item, answer=value, user = request.user)
Taking into account that django by default closes connections to the database (i.e. does not use persistent connections). My question is:
In case the dictionary has for example the answer to 60 questions (so it will iterate 60 times), would it open and close the connections 60 times, or does it only do it once?
Is there a better way to save POST information manually? (without using django forms, since for various reasons I currently need to do it manually)

This definitely is not a good way to store Answers in bulk, since:
you each time fetch the Item object for every single question;
your code does not handle the case correctly where an item is missing: in that case it will raise an exception, and the Django middleware will (likely) render a 500 page; and
it will make several calls to create all these objects.
We can create objects in bulk to reduce the number of queries. Typically we will create all elements with a single query, although depending on the database and the amount of data, it might take a limited number of queries.
We furtermore do not need to fetch the related Item objects, at all, we can just set the item_id field instead, the "twin" of the item ForeignKey field, like:
from django.db import IntegrityError
try:
answers = [
Answer(item_id=key, answer=value, user=request.user)
for key, value in request.POST.items()
if key != 'csrfmiddlewaretoken'
]
Answer.objects.bulk_create(answers)
except IntegrityError:
return render(request, 'survey/error.html')
The bulk_create will thus insert all the objects in a small number of queries and thus significantly reduce the time of the request.
Note however that bulk_create has some limitations (listed on the documentation page). It might be useful to read those carefully and take them into account. Although I think in the given case, these are not relevant, it is always better to know the limitations of the tools you are using.

I'm confused about how distinct() works with Django queries

I have this query:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location)
Which works great for telling me how many checkins have happened in my date range for a specific location. But I want know how many checkins were done by unique users. So I tried this:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location).values('user').distinct()
But that doesn't work, I get back an empty Array. Any ideas why?
Here is my CheckinAct model:
class CheckinAct(models.Model):
user = models.ForeignKey(User)
location = models.ForeignKey(Location)
time = models.DateTimeField()
----Update------
So now I have updated my query to look like this:
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
But I'm still getting multiple objects back that have the same user, like so:
[{'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
---- Update 2------
Here is something else I tried, but I'm still getting lots of identical user objects back when I log the checkins object.
checkins = CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user')).values('user', 'dcount')
logger.info("checkins!!! : " + str(checkins))
Logs the following:
checkins!!! : [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
Notice how there are 3 instances of the same user object. Is this working correctly or not? Is there a difference way to read out what comes back in the dict object? I just need to know how many unique users check into that specific location during the time range.

The answer is actually right in the Django docs. Unfortunately, very little attention is drawn to the importance of the particular part you need; so it's understandably missed. (Read down a little to the part dealing with Items.)
For your use-case, the following should give you exactly what you want:
checkins = CheckinAct.objects.filter(time__range=[start,end], location=checkin.location).\
values('user').annotate(checkin_count=Count('pk')).order_by()
UPDATE
Based on your comment, I think the issue of what you wanted to achieve has been confused all along. What the query above gives you is a list of the number of times each user checked in at a location, without duplicate users in said list. It now seems what you really wanted was the number of unique users that checked in at one particular location. To get that, use the following (which is much simpler anyways):
User.objects.filter(checkinat__location=location).distinct().count()
UPDATE for non-rel support
checkin_users = [(c.user.pk, c.user) for c in CheckinAct.objects.filter(location=location)]
unique_checkins = len(dict(checkin_users))
This works off the principle that dicts have unique keys. So when you convert the list of tuples to a dict, you end up with a list of unique users. But, this will generate 1*N queries, where N is the total amount of checkins (one query each time the user attribute is used. Normally, I'd do something like .select_related('user'), but that too requires a JOIN, which is apparently out. JOINs not being supported seems like a huge downside to non-rel, if true, but if that's the case this is going to be your only option.

You don't want DISTINCT. You actually want Django to do something that will end up giving you a GROUP BY clause. You are also correct that your final solution is to combine annotate() and values(), as discussed in the Django documentation.
What you want to do to get your results is to use annotate first, and then values, such as:
CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user').values('user', 'dcount')
The Django docs at the link I gave you above show a similarly constructed query (minus the filter aspect, which I added for your case in the proper location), and note that this will "now yield one unique result for each [checkin act]; however, only the [user] and the [dcount] annotation will be returned in the output data". (I edited the sentence to fit your case, but the principle is the same).
Hope that helps!

checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
If I am not mistaken, wouldn't the value you want be in the input as "dcount"? As a result, isn't that just being discarded when you decide to output the user value alone?
Can you tell me what happens when you try this?
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(Count('user')).order_by()
(The last order_by is to clear any built-in ordering that you may already have at the model level - not sure if you have anything like that, but doesn't hurt to ask...)

Django formset - show extra fields only when no initial data set?

I have a page that displays multiple Formsets, each of which has a prefix. The formsets are created using formset_factory the default options, including extra=1. Rows can be added or deleted with JavaScript.
If the user is adding new data, one blank row shows up. Perfect.
If the user has added data but form validation failed, in which case the formset is populated with POST data using MyFormset(data, prefix='o1-formsetname') etc., only the data that they have entered shows up. Again, perfect. (the o1 etc. are dynamically generated, each o corresponds to an "option", and each "option" may have multiple formsets).
However if the user is editing existing data, in which case the view populates the formset using MyFormset(initial=somedata, prefix='o1-formsetname') where somedata is a list of dicts of data that came from a model in the database, an extra blank row is inserted after this data. I don't want a blank row to appear unless the user explicitly adds one using the JavaScript.
Is there any simple way to prevent the formset from showing an extra row if the initial data is set? The reason I'm using initial in the third example is that if I just passed the data in using MyFormset(somedata, prefix='o1-formsetname') I'd have to do an extra step of reformatting all the data into a POSTdata style dict including prefixes for each field, for example o1-formsetname-1-price: x etc., as well as calculating the management form data, which adds a whole load of complication.
One solution could be to intercept the formset before it's sent to the template and manually remove the row, but the extra_forms attribute doesn't seem to be writeable and setting extra to 0 doesn't make any difference. I could also have the JavaScript detect this case and remove the row. However I can't help but think I'm missing something obvious since the behaviour I want is what would seem to be sensible expected behaviour to me.
Thanks.

Use the max_num keyword argument to formset_factory:
MyFormset = formset_factory([...], extra=1, max_num=1)
For more details, check out limiting the maximum number of forms.
One hitch: presumably you want to be able to process more than one blank form. This isn't too hard; just make sure that you don't use the max_num keyword argument in the POST processing side.

I've come up with a solution that works with Django 1.1. I created a subclass of BaseFormSet that overrides the total_form_count method such that, if initial forms exist, the total does not include extra forms. Bit of a hack perhaps, and maybe there's a better solution that I couldn't find, but it works.
class SensibleFormset(BaseFormSet):
def total_form_count(self):
"""Returns the total number of forms in this FormSet."""
if self.data or self.files:
return self.management_form.cleaned_data[TOTAL_FORM_COUNT]
else:
if self.initial_form_count() > 0:
total_forms = self.initial_form_count()
else:
total_forms = self.initial_form_count() + self.extra
if total_forms > self.max_num > 0:
total_forms = self.max_num
return total_forms

intelligent methodology for filtering server side

My lack of CS and inexperience is really coming to the forefront in this moment. I've never really handled filtering results server side. I'm thinking that this is not the right way to go about it. I'm using Django....
First, I assume that I can keep it DRYer by keeping this validation in my form definitions. Next, I was concerned about my chained filter statements. How important is it to use Q complex lookups as opposed to chaining filters at this point? I'm just building a prototype and I assume that I'll eventually have to go for a search solution more powerful than full text search.
My big issue right now (besides the length of the code and clearly the inefficiency) is that I'm not sure how to handle my rooms and workers inputs, which are select forms. If the user does not select a value, I want to remove these filters from the process server side. Should I just create two separate conditional series of lookups for these outcomes?
def search(request):
if request.method=='GET' and request.GET.get('region',''):
neighborhoods=request.GET.getlist('region')
min_rent=request.GET.get('min_cost','0')
min_rent=re.sub(r'[,]','',min_cost) #remove any ','
if re.search(r'[^\d]',min_cost):
min_cost=0
else:
min_cost=int(min_cost)
max_cost=request.GET.get('max_cost','0')
max_cost=re.sub(r'[,]','',max_cost) #remove any ','
if re.search(r'[^\d]',max_cost):
max_cost=100000
else:
max_cost=int(max_rent)
date_min=request.GET.get('from','')
date_max=request.GET.get('to','')
if not date_min:
date=(str(datetime.date.today()))
date_min=u'%s' %date
if not date_max:
date_max=u'2013-03-18'
rooms=request.GET.get('rooms',0)
if not rooms:
rooms=0
workers=request.GET.get('workers',0)
if not workers:
workers=0
#I should probably use Q objects here for complex lookups
posts=Post.objects.filter(region__in=region).filter(cost__gt=min_cost).filter(cost__lt=max_cost).filter(availability__gt=date_min).filter(availability__lt=date_max).filter(rooms=rooms).filter(workers=workers)
#return HttpResponse('%s' %posts)
return render_to_response("website/search.html",{'posts':posts),context_instance=RequestContext(request))

First, I assume that I can keep it
DRYer by keeping this validation in my
form definitions.
Yes, I'd put this in a form as it looks like you are using one to display the form anyways? Also, you can put a lot of your date formatting stuff right in the clean_FIELD methods to format the data in the cleaned_data dict. The only issue here is that output is actually modified so your users will see the change from 1,000 to 1000. Either way, I would put this logic in a form method.
# makes the view clean.
if form.is_valid():
form.get_posts(request)
return response
My big issue right now (besides the
length of the code and clearly the
inefficiency) is that I'm not sure how
to handle my rooms and workers inputs,
which are select forms. If the user
does not select a value, I want to
remove these filters from the process
server side. Should I just create two
separate conditional series of lookups
for these outcomes?
Q objects are only for complex lookups. I don't see a need for them here.
I also don't see why you need to chain the filters. I at first wondered if these are m2m, but these types of queries (__gt/__lt) don't behave any differently chaining as there is no overlap between the queries.
# this is more readable / concise.
# I'd combine as many of your queries as you can just for readability.
posts = Posts.objects.filter(
region__in=region,
cost__gte=min_cost,
# etc
)
Now, if you want optional arguments, my suggestion is to use a dictionary of keyword arguments so that you can dynamically populate the kwargs.
keyword_arguments = {
'region__in': region,
'cost__gte': min_cost,
'cost__lt': max_cost,
'availability__gt': date_min,
'availability__lt': date_max,
}
if request.GET.get('rooms'):
keyword_arguments['rooms'] = request.GET['rooms']
if request.GET.get('workers'):
keyword_arguments['workers'] = request.GET['workers']
posts = Posts.objects.filter(**keyword_arguments)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js