QuerySet subscripting won't work as expected - django

when got a QuerySet by using filter and it's easy to use the following code to do the change and save operation:
qs = SomeModel.objects.filter(owner_id=123)
# suppose qs has 1 or many elements
last_login_time = qs[0].last_login_time
qs[0].last_login_time = datetime.now() # I expect it can assign the new value, but it won't
assertEquals(qs[0].last_login_time, last_login_time) # YES, it doesn't change
qs[0].save() #So it won't update the old record
And after figuring this out, the following code will be used instead and it works:
qs = SomeModel.objects.filter(owner_id=123)
# suppose qs has 1 or many elements
obj = qs[0]
last_login_time = obj.last_login_time
obj.last_login_time = datetime.now() # I expect it can assign the new value, but it will
assertNotEquals(obj.last_login_time, last_login_time) # YES, it does change
obj.save() #So it will update the old record as expected
And I have met some of my friends/colleagues use the first approach to do the record updating. And IMO, it's natural and prone to use. (when you type qs[0] and type obj , they have the same type)
After reading the code(db.models.query), it can be figured out why.(when you subscript the QuerySet it will use the qs = self._clone() and assigning a value won't change at all)
Possible solutions:
make the assigning work for the
subscripting QuerySet
announce the
above first approach is wrong and
let the users know it
So I want to ask:
Is my question a real issue for django?(I'm wondering why django developer not make it work as expected)
What's your suggestion about this issue? And what's your preferred way for such an issue?

Use update for updating fields in a queryset.

I'm not really sure what you're asking here. Are you saying this is a bug? I don't think so, it's clearly defined behaviour: the queryset is lazy, but is evaluated when you iterate or slice it. Each time you do slice it, you get a new object. This is the logical consequence of the fact that slicing by itself doesn't cause the non-sliced queryset to be evaluated - if the result isn't already cached, slicing will perform a single database call with a LIMIT 1 to only get one result. Otherwise, you're left with extremely undesirable side-effects.
Now, if you think this could be better explained in the docs, you're welcome - and encouraged - to submit a bug with a patch that explains it better.

Related

I'm confused about how distinct() works with Django queries

I have this query:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location)
Which works great for telling me how many checkins have happened in my date range for a specific location. But I want know how many checkins were done by unique users. So I tried this:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location).values('user').distinct()
But that doesn't work, I get back an empty Array. Any ideas why?
Here is my CheckinAct model:
class CheckinAct(models.Model):
user = models.ForeignKey(User)
location = models.ForeignKey(Location)
time = models.DateTimeField()
----Update------
So now I have updated my query to look like this:
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
But I'm still getting multiple objects back that have the same user, like so:
[{'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
---- Update 2------
Here is something else I tried, but I'm still getting lots of identical user objects back when I log the checkins object.
checkins = CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user')).values('user', 'dcount')
logger.info("checkins!!! : " + str(checkins))
Logs the following:
checkins!!! : [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
Notice how there are 3 instances of the same user object. Is this working correctly or not? Is there a difference way to read out what comes back in the dict object? I just need to know how many unique users check into that specific location during the time range.
The answer is actually right in the Django docs. Unfortunately, very little attention is drawn to the importance of the particular part you need; so it's understandably missed. (Read down a little to the part dealing with Items.)
For your use-case, the following should give you exactly what you want:
checkins = CheckinAct.objects.filter(time__range=[start,end], location=checkin.location).\
values('user').annotate(checkin_count=Count('pk')).order_by()
UPDATE
Based on your comment, I think the issue of what you wanted to achieve has been confused all along. What the query above gives you is a list of the number of times each user checked in at a location, without duplicate users in said list. It now seems what you really wanted was the number of unique users that checked in at one particular location. To get that, use the following (which is much simpler anyways):
User.objects.filter(checkinat__location=location).distinct().count()
UPDATE for non-rel support
checkin_users = [(c.user.pk, c.user) for c in CheckinAct.objects.filter(location=location)]
unique_checkins = len(dict(checkin_users))
This works off the principle that dicts have unique keys. So when you convert the list of tuples to a dict, you end up with a list of unique users. But, this will generate 1*N queries, where N is the total amount of checkins (one query each time the user attribute is used. Normally, I'd do something like .select_related('user'), but that too requires a JOIN, which is apparently out. JOINs not being supported seems like a huge downside to non-rel, if true, but if that's the case this is going to be your only option.
You don't want DISTINCT. You actually want Django to do something that will end up giving you a GROUP BY clause. You are also correct that your final solution is to combine annotate() and values(), as discussed in the Django documentation.
What you want to do to get your results is to use annotate first, and then values, such as:
CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user').values('user', 'dcount')
The Django docs at the link I gave you above show a similarly constructed query (minus the filter aspect, which I added for your case in the proper location), and note that this will "now yield one unique result for each [checkin act]; however, only the [user] and the [dcount] annotation will be returned in the output data". (I edited the sentence to fit your case, but the principle is the same).
Hope that helps!
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
If I am not mistaken, wouldn't the value you want be in the input as "dcount"? As a result, isn't that just being discarded when you decide to output the user value alone?
Can you tell me what happens when you try this?
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(Count('user')).order_by()
(The last order_by is to clear any built-in ordering that you may already have at the model level - not sure if you have anything like that, but doesn't hurt to ask...)

manually create a Django QuerySet or rather manually add objects to a QuerySet

Basically I need a graceful way to do the following:-
obj1 = Model1.objects.select_related('model2').get(attribute1=value1)
obj2 = Model1.objects.select_related('model2').get(attribute2=value2)
model2_qs = QuerySet(model=Model2, qs_items=[obj1.model2,obj2.model2])
I may not be thinking right, but doing something like the following seems infinitely stupid to me.: -
obj1 = Model1.objects.select_related('model2').get(attribute1=value1)
model2_qs = Model2.objects.filter(pk=obj1.model2.pk)
Yes, I need to end up with a QuerySet of Model2 for later use (specifically to pass to a Django form).
In the first code block above,even if I use filter instead of get I will obviously have a QuerySet of Model1. Reverse lookups may not always be possible in my case.
If you're simply looking to create a queryset of items that you choose through some complicated process not representable in SQL you could always use the __in operator.
wanted_items = set()
for item in model1.objects.all():
if check_want_item(item):
wanted_items.add(item.pk)
return model1.objects.filter(pk__in = wanted_items)
You'll obviously have to adapt this to your situation but it should at least give you a starting point.
To manually add objects to a QuerySet, try _result_cache:
objs = ObjModel.objects.filter(...)
len(objs) #or anything that will evaluate and hit the db
objs._result_cache.append(yourObj)
PS: I did not understand (or tried to) chefsmart's question but I believe this answers to the question in title.
You can't manually add objects to a QuerySet. But why don't you put them in a list ?
obj1 = Model1.objects.select_related('model2').get(attribute1=value1)
obj2 = Model1.objects.select_related('model2').get(attribute2=value2)
model2 = list(obj1, obj2)

Django most efficient way to do this?

I have developed a few Django apps, all pretty straight-forward in terms of how I am interacting with the models.
I am building one now that has several different views which, for lack of a better term, are "canned" search result pages. These pages all return results from the same model, but they are filtered on different columns. One page we might be filtering on type, another we might be filtering on type and size, and on yet another we may be filtering on size only, etc...
I have written a function in views.py which is used by each of these pages, it takes a kwargs and in that are the criteria upon which to search. The minimum is one filter but one of the views has up to 4.
I am simply seeing if the kwargs dict contains one of the filter types, if so I filter the result on that value (I just wrote this code now, I apologize if any errors, but you should get the point):
def get_search_object(**kwargs):
q = Entry.objects.all()
if kwargs.__contains__('the_key1'):
q = q.filter(column1=kwargs['the_key1'])
if kwargs.__contains__('the_key2'):
q = q.filter(column2=kwargs['the_key2'])
return q.distinct()
Now, according to the django docs (http://docs.djangoproject.com/en/dev/topics/db/queries/#id3), these is fine, in that the DB will not be hit until the set is evaluated, lately though I have heard that this is not the most efficient way to do it and one should probably use Q objects instead.
I guess I am looking for an answer from other developers out there. My way currently works fine, if my way is totally wrong from a resources POV, then I will change ASAP.
Thanks in advance
Resource-wise, you're fine, but there are a lot of ways it can be stylistically improved to avoid using the double-underscore methods and to make it more flexible and easier to maintain.
If the kwargs being used are the actual column names then you should be able to pretty easily simplify it since what you're kind of doing is deconstructing the kwargs and rebuilding it manually but for only specific keywords.
def get_search_object(**kwargs):
entries = Entry.objects.filter(**kwargs)
return entries.distinct()
The main difference there is that it doesn't enforce that the keys be actual columns and pretty badly needs some exception handling in there. If you want to restrict it to a specific set of fields, you can specify that list and then build up a dict with the valid entries.
def get_search_object(**kwargs):
valid_fields = ['the_key1', 'the_key2']
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
If you want a fancier solution that just checks that it's a valid field on that model, you can (ab)use _meta:
def get_search_object(**kwargs):
valid_fields = [field.name for field in Entry._meta.fields]
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
In this case, your usage is fine from an efficiency standpoint. You would only need to use Q objects if you needed to OR your filters instead of AND.

Get last record in a queryset

How can I retrieve the last record in a certain queryset?
Django Doc:
latest(field_name=None) returns the latest object in the table, by date, using the field_name provided as the date field.
This example returns the latest Entry in the table, according to the
pub_date field:
Entry.objects.latest('pub_date')
EDIT : You now have to use Entry.objects.latest('pub_date')
You could simply do something like this, using reverse():
queryset.reverse()[0]
Also, beware this warning from the Django documentation:
... note that reverse() should
generally only be called on a QuerySet
which has a defined ordering (e.g.,
when querying against a model which
defines a default ordering, or when
using order_by()). If no such ordering
is defined for a given QuerySet,
calling reverse() on it has no real
effect (the ordering was undefined
prior to calling reverse(), and will
remain undefined afterward).
The simplest way to do it is:
books.objects.all().last()
You also use this to get the first entry like so:
books.objects.all().first()
To get First object:
ModelName.objects.first()
To get last objects:
ModelName.objects.last()
You can use filter
ModelName.objects.filter(name='simple').first()
This works for me.
Django >= 1.6
Added QuerySet methods first() and last() which are convenience methods returning the first or last object matching the filters. Returns None if there are no objects matching.
When the queryset is already exhausted, you may do this to avoid another db hint -
last = queryset[len(queryset) - 1] if queryset else None
Don't use try...except....
Django doesn't throw IndexError in this case.
It throws AssertionError or ProgrammingError(when you run python with -O option)
You can use Model.objects.last() or Model.objects.first().
If no ordering is defined then the queryset is ordered based on the primary key. If you want ordering behaviour queryset then you can refer to the last two points.
If you are thinking to do this, Model.objects.all().last() to retrieve last and Model.objects.all().first() to retrieve first element in a queryset or using filters without a second thought. Then see some caveats below.
The important part to note here is that if you haven't included any ordering in your model the data can be in any order and you will have a random last or first element which was not expected.
Eg. Let's say you have a model named Model1 which has 2 columns id and item_count with 10 rows having id 1 to 10.[There's no ordering defined]
If you fetch Model.objects.all().last() like this, You can get any element from the list of 10 elements. Yes, It is random as there is no default ordering.
So what can be done?
You can define ordering based on any field or fields on your model. It has performance issues as well, Please check that also. Ref: Here
OR you can use order_by while fetching.
Like this: Model.objects.order_by('item_count').last()
If using django 1.6 and up, its much easier now as the new api been introduced -
Model.object.earliest()
It will give latest() with reverse direction.
p.s. - I know its old question, I posting as if going forward someone land on this question, they get to know this new feature and not end up using old method.
In a Django template I had to do something like this to get it to work with a reverse queryset:
thread.forumpost_set.all.last
Hope this helps someone looking around on this topic.
MyModel.objects.order_by('-id')[:1]
If you use ids with your models, this is the way to go to get the latest one from a qs.
obj = Foo.objects.latest('id')
You can try this:
MyModel.objects.order_by('-id')[:1]
The simplest way, without having to worry about the current ordering, is to convert the QuerySet to a list so that you can use Python's normal negative indexing. Like so:
list(User.objects.all())[-1]

django - reorder queryset after slicing it

I fetch the latest 5 rows from a Foo model which is ordered by a datetime field.
qs = Foo.objects.all()[:5]
In the following step, I want to reorder the queryset by some other criteria (actually, by the same datetime field in the opposite direction). But reordering after a slice is not permitted. reverse() undoes the first ordering, giving me a differet queryset. Is there a way to accomplish what I want without creating a list from the queryset and doing the ordering using it?
order_by gives you SQL in-database ordering. You're already using that, and then slicing on it. At that point, the results are retrieved into memory. If you want to change their order, you need to use Python in-memory sorting to do it, not the ORM in-database sorting.
In your case, Daniel has already given the best solution: since you simply want to sort by the same field, but in the other order, just reverse the list you have:
qs = Foo.objects.all()[:5]
objs = reversed(qs)
If you had wanted to sort by some other field, then you'd use the sorted() function with a custom key function:
qs = Foo.objects.all()[:5]
objs = sorted(qs, key=lambda o: o.some_other_field)
No, there's no way of doing that. order_by is an operation on the database, but when you slice a queryset it is evaluated and doesn't go back to the database after that.
Sounds like you already know the solution, though: run reversed() on the evaluated qs.
qs = reversed(Foo.objects.all()[:5])
Late answer, but this just worked for me:
import random
sorted(queryset[:10], key=lambda x: random.random())