I looked at this thread, with an example of sorting dictionaries.
I have a dictionary of programme objects where the key is a programme object and the value is a lookup of the number of related Project objects.
def DepartmentDetail(request, pk):
department = Department.objects.get(pk=pk)
programmes = Programme.objects.all().filter(department=department).exclude(active=False).order_by('long_name')
combi = {}
for p in programmes:
prj = Project.objects.all().filter(programme=p)
combi[p] = str(len(prj))
return render(request, 'sysadmin/department.html',{'department': department, 'programmes': programmes, 'combi': sorted(combi.items())})
In the model, Programme returns a string 'long_name', so I believe that I am trying to sort a string key and a string value.
In the template I get to the keys and values as so,
{% for programme, n in combi %}
This gives me the error..
unorderable types: Programme() < Programme()
I don't really understand the error, in the python 3 documentation it states that the sorted() method accepts any iterable - So why does this happen?
I'm looking at collections.OrderedDict to solve the problem, but I want to know why this doesn't work.
Thanx.
Databases with index on columns are really good at sorting. There's almost never a need to sort on the client side. You can almost always do it in the server. The funny part it you apparently know how to do it too.
....exclude(active=False).order_by('long_name') # <--- this
Guess what, your data is already sorted there isn't a need to sort it again inside python!!
But there is a much bigger issue in your code. You are fetching a set of Project items and then looping through that set to retrieve them all over again one by one. So if you have 200 Project items you are doing 200 queries when one query does the job just as well. Just add select_related or prefetch_related depending on which direction you have the relationship.
Your code ideally should be something like this
department = Department.objects.get(pk=pk)
programmes = Programme.objects.all().filter(department=department).exclude(active=False).order_by('long_name')
return render(request, 'sysadmin/department.html',{'department': department, 'programmes': programmes,})
As far as I can see combi just contains duplicated data. The same thing can be accessed from programmes eg. programme.project_set.all()
(again this depends on which direction you have the relationship, your models are not shown)
Recommended reading: https://docs.djangoproject.com/en/1.10/ref/models/fields/#django.db.models.ForeignKey
The issue is that sorted expects a way to be able to order the items and by default there isn't any way to know how to order your objects. You can provide a key
sorted(combi.items(), key=lambda i: i.long_name)
Related
Can anyone help, I want to return an ordered list based on forloop in Django using a field in the model that contains both integer and string in the format MM/1234. The loop should return the values with the least interger(1234) in ascending order in the html template.
Ideally you want to change the model to have two fields, one integer and one string, so you can code a queryset with ordering based on the integer one. You can then define a property of the model to return the self.MM+"/"+str( self.nn) composite value if you often need to use that. But if it's somebody else's database schema, this may not be an option.
In which case you'll have to convert your queryset into a list (which reads all the data rows at once) and then sort the list in Python rather than in the database. You can run out of memory or bring your server to its knees if the list contains millions of objects. count=qs.count() is a DB operation that won't.
qs = Foo.objects.filter( your_selection_criteria)
# you might want to count this before the next step, and chicken out if too many
# also this simple key function will crash if there's ever no "/" in that_field
all_obj = sorted( list( qs),
key = lambda obj: obj.that_field.split('/')[1] )
I am trying to display a map of my data based on a search. The easiest way to handle the map display would be to serialized the queryset generated by the search, and indeed this works just fine using . However, I'd really like to allow for multiple searches, with the displayed points being shown in a user chosen color. The user chosen color, obviously cannot come from the database, since it is not a property of these objects, so none of the aggregators make sense here.
I have tried simply making a utility class, since what I really need is a somewhat complex join between two model classes that then gets serialized into geojson. However, once I created that utility class, it became evident that I lost a lot of the benefits of having a queryset, especially the ability to easily serialize the data with django-geojson (or natively once I can get 1.8 to run smoothly).
Basically, I want to be able to do something like:
querySet = datumClass.objects.filter(...user submitted search parameters...).annotate(color='blue')
Is this possible at all? It seems like this would be more elegant and would work better than my current solution of a non-model utility class which has some serious serialization issues when I try to use python-geojson to serialize.
The problem is that extra comes with all sorts of warning about usefulness or deprecation... But this works:
.extra(select={'color': "'blue'"})
Notice the double quotes wrapping the string value.
This translates to:
SELECT ('blue') AS "color"
Not quite sure what you are trying to achieve, but you can add extra attributes to your objects iterating over the queryset in the view. These can be accessed from the template.
for object in queryset :
if object.contition = 'a'
object.color = 'blue'
else:
object.color = 'green'
if you have a dictionary that maps fields to values, you can do things like
filter_dictionary = {
'date__lte' : '2014-03-01'
}
qs = DatumClass.objects.filter(**filter_dictionary)
And qs would have all dates less than that date (if it has a date field). So, as a user, I could submit any key, value pairs that you could place in your dictionary.
I have an import of objects where I want to check against the database if it has already been imported earlier, if it has I will update it, if not I will create a new one. But what is the best way of doing this.
Right now I have this:
old_books = Book.objects.filter(foreign_source="import")
for book in new_books:
try:
old_book = old_books.get(id=book.id):
#update book
except:
#create book
But that creates a database call for each book in new_books. So I am looking for a way where it will only make one call to the database, and then just fetch objects from that queryset.
Ps: not looking for a get_or_create kind of thing as the update and create functions are more complex than that :)
--- EDIT---
I guess I haven't been good enough in my explanation, as the answers does not reflect what the problem is. So to make it more clear (I hope):
I want to pick out a single object from a queryset, based on an id of that object. I want the full object so I can update it and save it with it's changed values. So lets say I have a queryset with 3 objects, A and B and C. Then I want a way to ask if the queryset has object B and if it has then get it, without an extra database call.
Assuming new_books is another queryset of Book you can try filter on id of it as
old_books = Book.objects.filter(foreign_source="import").filter(id__in=[b.id for b in new_books])
With this old_books has books that are already created.
You can use the values_list('id', flat=True) to get all ids in a single DB call (is much faster than querysets). Then you can use sets to find the intersections.
new_book_ids = new_books.values_list('id', flat=True)
old_book_ids = Book.objects.filter(foreign_source="import") \
.values_list('id', flat=True)
to_update_ids = set(new_book_ids) & set(old_book_ids)
to_create_ids = set(new_book_ids) - to_update_ids
-- EDIT (to include the updated part) --
I guess the problem you are facing is in bulk updating rather than bulk fetch.
If the updates are simple, then something like this might work:
old_book_ids = Book.objects.filter(foreign_source="import") \
.values_list('id', flat=True)
to_update = []
to_create = []
for book in new_books:
if book.id in old_book_ids:
# list of books to update
# to_update.append(book.id)
else:
# create a book object
# Book(**details)
# Update books
Book.objects.filter(id__in=to_update).update(field='new_value')
Book.objects.bulk_create(to_create)
But if the updates are complex (update fields are dependent upon related fields), then you can check insert... on duplicated key update option in MySQL and its custom manager for Django.
Please leave a comment if the above is completely off the track.
You'll have to do more than one query. You need two groups of objects, you can't fetch them both and split them up at the same time arbitrarily like that. There's no bulk_get_or_create method.
However, the example code you've given will do a query for every object which really isn't very efficient (or djangoic for that matter). Instead, use the __in clause to create smart subqueries, and then you can limit database hits to only two queries:
old_to_update = Book.objects.filter(foreign_source="import", pk__in=new_books)
old_to_create = Book.objects.filter(foreign_source="import").exclude(pk__in=new_books)
Django is smart enough to know how to use that new_books queryset in that context (it can also be a regular list of ids)
update
Queryset objects are just a sort of list of objects. So all you need to do now is loop over the objects:
for book in old_to_update:
#update book
for book in old_to_create:
#create book
At this point, when it's fetching the books from the QuerySet, not from the databse, which is a lot more efficient than using .get() for each and every one of them - and you get the same result. each iteration you get to work with an object, the same as if you got it from a direct .get() call.
The best solution I have found is using the python next() function.
First evaluate the queryset into a set and then pick the book you need with next:
old_books = set(Book.objects.filter(foreign_source="import"))
old_book = next((book for book in existing_books if book.id == new_book.id), None )
That way the database is not queried everytime you need to get a specific book from the queryset. And then you can just do:
if old_book:
#update book
old_book.save()
else:
#create new book
In Django 1.7 there is an update_or_create() method that might solve this problem in a better way: https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.update_or_create
I have this query:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location)
Which works great for telling me how many checkins have happened in my date range for a specific location. But I want know how many checkins were done by unique users. So I tried this:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location).values('user').distinct()
But that doesn't work, I get back an empty Array. Any ideas why?
Here is my CheckinAct model:
class CheckinAct(models.Model):
user = models.ForeignKey(User)
location = models.ForeignKey(Location)
time = models.DateTimeField()
----Update------
So now I have updated my query to look like this:
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
But I'm still getting multiple objects back that have the same user, like so:
[{'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
---- Update 2------
Here is something else I tried, but I'm still getting lots of identical user objects back when I log the checkins object.
checkins = CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user')).values('user', 'dcount')
logger.info("checkins!!! : " + str(checkins))
Logs the following:
checkins!!! : [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
Notice how there are 3 instances of the same user object. Is this working correctly or not? Is there a difference way to read out what comes back in the dict object? I just need to know how many unique users check into that specific location during the time range.
The answer is actually right in the Django docs. Unfortunately, very little attention is drawn to the importance of the particular part you need; so it's understandably missed. (Read down a little to the part dealing with Items.)
For your use-case, the following should give you exactly what you want:
checkins = CheckinAct.objects.filter(time__range=[start,end], location=checkin.location).\
values('user').annotate(checkin_count=Count('pk')).order_by()
UPDATE
Based on your comment, I think the issue of what you wanted to achieve has been confused all along. What the query above gives you is a list of the number of times each user checked in at a location, without duplicate users in said list. It now seems what you really wanted was the number of unique users that checked in at one particular location. To get that, use the following (which is much simpler anyways):
User.objects.filter(checkinat__location=location).distinct().count()
UPDATE for non-rel support
checkin_users = [(c.user.pk, c.user) for c in CheckinAct.objects.filter(location=location)]
unique_checkins = len(dict(checkin_users))
This works off the principle that dicts have unique keys. So when you convert the list of tuples to a dict, you end up with a list of unique users. But, this will generate 1*N queries, where N is the total amount of checkins (one query each time the user attribute is used. Normally, I'd do something like .select_related('user'), but that too requires a JOIN, which is apparently out. JOINs not being supported seems like a huge downside to non-rel, if true, but if that's the case this is going to be your only option.
You don't want DISTINCT. You actually want Django to do something that will end up giving you a GROUP BY clause. You are also correct that your final solution is to combine annotate() and values(), as discussed in the Django documentation.
What you want to do to get your results is to use annotate first, and then values, such as:
CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user').values('user', 'dcount')
The Django docs at the link I gave you above show a similarly constructed query (minus the filter aspect, which I added for your case in the proper location), and note that this will "now yield one unique result for each [checkin act]; however, only the [user] and the [dcount] annotation will be returned in the output data". (I edited the sentence to fit your case, but the principle is the same).
Hope that helps!
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
If I am not mistaken, wouldn't the value you want be in the input as "dcount"? As a result, isn't that just being discarded when you decide to output the user value alone?
Can you tell me what happens when you try this?
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(Count('user')).order_by()
(The last order_by is to clear any built-in ordering that you may already have at the model level - not sure if you have anything like that, but doesn't hurt to ask...)
I have developed a few Django apps, all pretty straight-forward in terms of how I am interacting with the models.
I am building one now that has several different views which, for lack of a better term, are "canned" search result pages. These pages all return results from the same model, but they are filtered on different columns. One page we might be filtering on type, another we might be filtering on type and size, and on yet another we may be filtering on size only, etc...
I have written a function in views.py which is used by each of these pages, it takes a kwargs and in that are the criteria upon which to search. The minimum is one filter but one of the views has up to 4.
I am simply seeing if the kwargs dict contains one of the filter types, if so I filter the result on that value (I just wrote this code now, I apologize if any errors, but you should get the point):
def get_search_object(**kwargs):
q = Entry.objects.all()
if kwargs.__contains__('the_key1'):
q = q.filter(column1=kwargs['the_key1'])
if kwargs.__contains__('the_key2'):
q = q.filter(column2=kwargs['the_key2'])
return q.distinct()
Now, according to the django docs (http://docs.djangoproject.com/en/dev/topics/db/queries/#id3), these is fine, in that the DB will not be hit until the set is evaluated, lately though I have heard that this is not the most efficient way to do it and one should probably use Q objects instead.
I guess I am looking for an answer from other developers out there. My way currently works fine, if my way is totally wrong from a resources POV, then I will change ASAP.
Thanks in advance
Resource-wise, you're fine, but there are a lot of ways it can be stylistically improved to avoid using the double-underscore methods and to make it more flexible and easier to maintain.
If the kwargs being used are the actual column names then you should be able to pretty easily simplify it since what you're kind of doing is deconstructing the kwargs and rebuilding it manually but for only specific keywords.
def get_search_object(**kwargs):
entries = Entry.objects.filter(**kwargs)
return entries.distinct()
The main difference there is that it doesn't enforce that the keys be actual columns and pretty badly needs some exception handling in there. If you want to restrict it to a specific set of fields, you can specify that list and then build up a dict with the valid entries.
def get_search_object(**kwargs):
valid_fields = ['the_key1', 'the_key2']
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
If you want a fancier solution that just checks that it's a valid field on that model, you can (ab)use _meta:
def get_search_object(**kwargs):
valid_fields = [field.name for field in Entry._meta.fields]
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
In this case, your usage is fine from an efficiency standpoint. You would only need to use Q objects if you needed to OR your filters instead of AND.