Quick overview:
I have a Forecast model setup which has a workflow_state.
Now I'm trying to query the Forecast for all the forecasts that are in a certain state AND the current person logged in is_staff.
If i was writing a raw query this wouldn't be an issue because i could write something like:
SELECT * FROM forecast WHERE forecast.workflow_state_id in (1,2,3,4) AND 1 = user.is_staff
However, when trying to write this in a queryset I can't figure out how to reference a constant. I don't want to write a raw queryset and if possible want to avoid using the extra field.
Any suggestions would be appreciated. Thanks
Your constant is only a True
Forecast.objects.filter(workflow_state__in=[1,2,3,4], user__is_staf=True)
Your edit makes things rather less than clear, but you seem to be asking how to do a check on the current logged-in user, rather than on the user referenced by the model. In which case, you don't do that in a query at all; your example SQL statement wouldn't work, and neither would doing it in the ORM. You do it in Python, of course:
if request.user.is_staff:
forecasts = Forecast.objects.filter(workflow_state__in=[1,2,3,4])
Related
I'm attempting to use Django to build a simple website. I have a set of blog posts that have a date field attached to indicate the day they were published. I have a table that contains a list of dates and temperatures. On each post, I would like to display the temperature on the day it was published.
The two models are as follows:
class Post(models.Model):
title = models.CharField(max_length=200)
text = models.TextField()
date = models.DateField()
class Temperature(models.Model):
date = models.DateField()
temperature = models.IntegerField()
I would like to be able to reference the temperature field from the second table using the date field from the first. Is this possible?
In SQL, this is a simple query. I would do the following:
Select temperature from Temperature t join Post p on t.date = p.date
I think I really have two questions:
Is it possible to brute force this, even if it's not best practice? I've googled a lot and tried using raw sql and objects.extra, but can't get them to do what I want. I'm also wary of relying on them for the long haul.
Since this seems to be a simple task, it seems likely that I'm overcomplicating it by having my models set up sub-optimally. Is there something I'm missing about how I should design my models? That is, what's the best practice for doing something like this? (I've successfully pulled the temperature into my blog post by using a foreign key in the Temperature model. But if I go that route, I don't see how I could easily make sure that my temperature dates get the correct foreign key assigned to them so that the temperature date maps to the correct post date.)
There will likely be better answers than this one, but I'll throw in my 2ยข anyway.
You could try a property inside the Post model that returns the temperature:
#property
def temperature(self):
try:
return Temperature.objects.values_list('temperature',flat=True).get(date=self.date)
except:
return None
(code not tested)
About your Models:
If you will be displaying the temperature in a Post list (a list of Posts with their temperatures), then maybe it will be simpler to code and a faster query to just add a temperature field to your Post model.
You can keep the Temperature model. Then:
Assuming you have the temperature data already present in you Temperature model at the time of Post instance creation, you can fill that new field in a custom save method.
If you get temperature data after Post creation, you cann fill in that new temperature field through a background job (maybe triggered by crontab or similar).
Sometimes database orthogonality (not repeating info in many tables) is not the best strategy. Just something to think about, depending on how often you will be querying the Post models and how simple you want to keep that query code.
I think this might be a basic approach to solve the problem
post_dates = Post.objects.all().values('date')
result_temprature = Temperature.objects.filter(date__in = post_dates).values('temperature')
Subqueries could be your friend here. Something like the following should work:
from django.db.models import OuterRef, Subquery
temps = Temperature.objects.filter(date=OuterRef('date'))
posts = Post.objects.annotate(temperature=Subquery(temps.values('temperature')[:1]))
for post in posts:
temperature = post.temperature
Then you can just iterate through posts and access the temperature off each post instance
I have an import of objects where I want to check against the database if it has already been imported earlier, if it has I will update it, if not I will create a new one. But what is the best way of doing this.
Right now I have this:
old_books = Book.objects.filter(foreign_source="import")
for book in new_books:
try:
old_book = old_books.get(id=book.id):
#update book
except:
#create book
But that creates a database call for each book in new_books. So I am looking for a way where it will only make one call to the database, and then just fetch objects from that queryset.
Ps: not looking for a get_or_create kind of thing as the update and create functions are more complex than that :)
--- EDIT---
I guess I haven't been good enough in my explanation, as the answers does not reflect what the problem is. So to make it more clear (I hope):
I want to pick out a single object from a queryset, based on an id of that object. I want the full object so I can update it and save it with it's changed values. So lets say I have a queryset with 3 objects, A and B and C. Then I want a way to ask if the queryset has object B and if it has then get it, without an extra database call.
Assuming new_books is another queryset of Book you can try filter on id of it as
old_books = Book.objects.filter(foreign_source="import").filter(id__in=[b.id for b in new_books])
With this old_books has books that are already created.
You can use the values_list('id', flat=True) to get all ids in a single DB call (is much faster than querysets). Then you can use sets to find the intersections.
new_book_ids = new_books.values_list('id', flat=True)
old_book_ids = Book.objects.filter(foreign_source="import") \
.values_list('id', flat=True)
to_update_ids = set(new_book_ids) & set(old_book_ids)
to_create_ids = set(new_book_ids) - to_update_ids
-- EDIT (to include the updated part) --
I guess the problem you are facing is in bulk updating rather than bulk fetch.
If the updates are simple, then something like this might work:
old_book_ids = Book.objects.filter(foreign_source="import") \
.values_list('id', flat=True)
to_update = []
to_create = []
for book in new_books:
if book.id in old_book_ids:
# list of books to update
# to_update.append(book.id)
else:
# create a book object
# Book(**details)
# Update books
Book.objects.filter(id__in=to_update).update(field='new_value')
Book.objects.bulk_create(to_create)
But if the updates are complex (update fields are dependent upon related fields), then you can check insert... on duplicated key update option in MySQL and its custom manager for Django.
Please leave a comment if the above is completely off the track.
You'll have to do more than one query. You need two groups of objects, you can't fetch them both and split them up at the same time arbitrarily like that. There's no bulk_get_or_create method.
However, the example code you've given will do a query for every object which really isn't very efficient (or djangoic for that matter). Instead, use the __in clause to create smart subqueries, and then you can limit database hits to only two queries:
old_to_update = Book.objects.filter(foreign_source="import", pk__in=new_books)
old_to_create = Book.objects.filter(foreign_source="import").exclude(pk__in=new_books)
Django is smart enough to know how to use that new_books queryset in that context (it can also be a regular list of ids)
update
Queryset objects are just a sort of list of objects. So all you need to do now is loop over the objects:
for book in old_to_update:
#update book
for book in old_to_create:
#create book
At this point, when it's fetching the books from the QuerySet, not from the databse, which is a lot more efficient than using .get() for each and every one of them - and you get the same result. each iteration you get to work with an object, the same as if you got it from a direct .get() call.
The best solution I have found is using the python next() function.
First evaluate the queryset into a set and then pick the book you need with next:
old_books = set(Book.objects.filter(foreign_source="import"))
old_book = next((book for book in existing_books if book.id == new_book.id), None )
That way the database is not queried everytime you need to get a specific book from the queryset. And then you can just do:
if old_book:
#update book
old_book.save()
else:
#create new book
In Django 1.7 there is an update_or_create() method that might solve this problem in a better way: https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.update_or_create
I have this query:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location)
Which works great for telling me how many checkins have happened in my date range for a specific location. But I want know how many checkins were done by unique users. So I tried this:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location).values('user').distinct()
But that doesn't work, I get back an empty Array. Any ideas why?
Here is my CheckinAct model:
class CheckinAct(models.Model):
user = models.ForeignKey(User)
location = models.ForeignKey(Location)
time = models.DateTimeField()
----Update------
So now I have updated my query to look like this:
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
But I'm still getting multiple objects back that have the same user, like so:
[{'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
---- Update 2------
Here is something else I tried, but I'm still getting lots of identical user objects back when I log the checkins object.
checkins = CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user')).values('user', 'dcount')
logger.info("checkins!!! : " + str(checkins))
Logs the following:
checkins!!! : [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
Notice how there are 3 instances of the same user object. Is this working correctly or not? Is there a difference way to read out what comes back in the dict object? I just need to know how many unique users check into that specific location during the time range.
The answer is actually right in the Django docs. Unfortunately, very little attention is drawn to the importance of the particular part you need; so it's understandably missed. (Read down a little to the part dealing with Items.)
For your use-case, the following should give you exactly what you want:
checkins = CheckinAct.objects.filter(time__range=[start,end], location=checkin.location).\
values('user').annotate(checkin_count=Count('pk')).order_by()
UPDATE
Based on your comment, I think the issue of what you wanted to achieve has been confused all along. What the query above gives you is a list of the number of times each user checked in at a location, without duplicate users in said list. It now seems what you really wanted was the number of unique users that checked in at one particular location. To get that, use the following (which is much simpler anyways):
User.objects.filter(checkinat__location=location).distinct().count()
UPDATE for non-rel support
checkin_users = [(c.user.pk, c.user) for c in CheckinAct.objects.filter(location=location)]
unique_checkins = len(dict(checkin_users))
This works off the principle that dicts have unique keys. So when you convert the list of tuples to a dict, you end up with a list of unique users. But, this will generate 1*N queries, where N is the total amount of checkins (one query each time the user attribute is used. Normally, I'd do something like .select_related('user'), but that too requires a JOIN, which is apparently out. JOINs not being supported seems like a huge downside to non-rel, if true, but if that's the case this is going to be your only option.
You don't want DISTINCT. You actually want Django to do something that will end up giving you a GROUP BY clause. You are also correct that your final solution is to combine annotate() and values(), as discussed in the Django documentation.
What you want to do to get your results is to use annotate first, and then values, such as:
CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user').values('user', 'dcount')
The Django docs at the link I gave you above show a similarly constructed query (minus the filter aspect, which I added for your case in the proper location), and note that this will "now yield one unique result for each [checkin act]; however, only the [user] and the [dcount] annotation will be returned in the output data". (I edited the sentence to fit your case, but the principle is the same).
Hope that helps!
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
If I am not mistaken, wouldn't the value you want be in the input as "dcount"? As a result, isn't that just being discarded when you decide to output the user value alone?
Can you tell me what happens when you try this?
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(Count('user')).order_by()
(The last order_by is to clear any built-in ordering that you may already have at the model level - not sure if you have anything like that, but doesn't hurt to ask...)
I'm curious if there's any way to do a query in Django that's not a "SELECT * FROM..." underneath. I'm trying to do a "SELECT DISTINCT columnName FROM ..." instead.
Specifically I have a model that looks like:
class ProductOrder(models.Model):
Product = models.CharField(max_length=20, promary_key=True)
Category = models.CharField(max_length=30)
Rank = models.IntegerField()
where the Rank is a rank within a Category. I'd like to be able to iterate over all the Categories doing some operation on each rank within that category.
I'd like to first get a list of all the categories in the system and then query for all products in that category and repeat until every category is processed.
I'd rather avoid raw SQL, but if I have to go there, that'd be fine. Though I've never coded raw SQL in Django/Python before.
One way to get the list of distinct column names from the database is to use distinct() in conjunction with values().
In your case you can do the following to get the names of distinct categories:
q = ProductOrder.objects.values('Category').distinct()
print q.query # See for yourself.
# The query would look something like
# SELECT DISTINCT "app_productorder"."category" FROM "app_productorder"
There are a couple of things to remember here. First, this will return a ValuesQuerySet which behaves differently from a QuerySet. When you access say, the first element of q (above) you'll get a dictionary, NOT an instance of ProductOrder.
Second, it would be a good idea to read the warning note in the docs about using distinct(). The above example will work but all combinations of distinct() and values() may not.
PS: it is a good idea to use lower case names for fields in a model. In your case this would mean rewriting your model as shown below:
class ProductOrder(models.Model):
product = models.CharField(max_length=20, primary_key=True)
category = models.CharField(max_length=30)
rank = models.IntegerField()
It's quite simple actually if you're using PostgreSQL, just use distinct(columns) (documentation).
Productorder.objects.all().distinct('category')
Note that this feature has been included in Django since 1.4
User order by with that field, and then do distinct.
ProductOrder.objects.order_by('category').values_list('category', flat=True).distinct()
The other answers are fine, but this is a little cleaner, in that it only gives the values like you would get from a DISTINCT query, without any cruft from Django.
>>> set(ProductOrder.objects.values_list('category', flat=True))
{u'category1', u'category2', u'category3', u'category4'}
or
>>> list(set(ProductOrder.objects.values_list('category', flat=True)))
[u'category1', u'category2', u'category3', u'category4']
And, it works without PostgreSQL.
This is less efficient than using a .distinct(), presuming that DISTINCT in your database is faster than a python set, but it's great for noodling around the shell.
Update:
This is answer is great for making queries in the Django shell during development. DO NOT use this solution in production unless you are absolutely certain that you will always have a trivially small number of results before set is applied. Otherwise, it's a terrible idea from a performance standpoint.
Thank to this post I'm able to easily do count and group by queries in a Django view:
Django equivalent for count and group by
What I'm doing in my app is displaying a list of coin types and face values available in my database for a country, so coins from the UK might have a face value of "1 farthing" or "6 pence". The face_value is the 6, the currency_type is the "pence", stored in a related table.
I have the following code in my view that gets me 90% of the way there:
def coins_by_country(request, country_name):
country = Country.objects.get(name=country_name)
coin_values = Collectible.objects.filter(country=country.id, type=1).extra(select={'count': 'count(1)'},
order_by=['-count']).values('count', 'face_value', 'currency_type')
coin_values.query.group_by = ['currency_type_id', 'face_value']
return render_to_response('icollectit/coins_by_country.html', {'coin_values': coin_values, 'country': country } )
The currency_type_id comes across as the number stored in the foreign key field (i.e. 4). What I want to do is retrieve the actual object that it references as part of the query (the Currency model, so I can get the Currency.name field in my template).
What's the best way to do that?
You can't do it with values(). But there's no need to use that - you can just get the actual Collectible objects, and each one will have a currency_type attribute that will be the relevant linked object.
And as justinhamade suggests, using select_related() will help to cut down the number of database queries.
Putting it together, you get:
coin_values = Collectible.objects.filter(country=country.id,
type=1).extra(
select={'count': 'count(1)'},
order_by=['-count']
).select_related()
select_related() got me pretty close, but it wanted me to add every field that I've selected to the group_by clause.
So I tried appending values() after the select_related(). No go. Then I tried various permutations of each in different positions of the query. Close, but not quite.
I ended up "wimping out" and just using raw SQL, since I already knew how to write the SQL query.
def coins_by_country(request, country_name):
country = get_object_or_404(Country, name=country_name)
cursor = connection.cursor()
cursor.execute('SELECT count(*), face_value, collection_currency.name FROM collection_collectible, collection_currency WHERE collection_collectible.currency_type_id = collection_currency.id AND country_id=%s AND type=1 group by face_value, collection_currency.name', [country.id] )
coin_values = cursor.fetchall()
return render_to_response('icollectit/coins_by_country.html', {'coin_values': coin_values, 'country': country } )
If there's a way to phrase that exact query in the Django queryset language I'd be curious to know. I imagine that an SQL join with a count and grouping by two columns isn't super-rare, so I'd be surprised if there wasn't a clean way.
Have you tried select_related() http://docs.djangoproject.com/en/dev/ref/models/querysets/#id4
I use it a lot it seems to work well then you can go coin_values.currency.name.
Also I dont think you need to do country=country.id in your filter, just country=country but I am not sure what difference that makes other than less typing.