I'm struggling getting my head around the Django's ORM. What I want to do is get a list of distinct values within a field on my table .... the equivalent of one of the following:
SELECT DISTINCT myfieldname FROM mytable
(or alternatively)
SELECT myfieldname FROM mytable GROUP BY myfieldname
I'd at least like to do it the Django way before resorting to raw sql.
For example, with a table:
id, street, city
1, Main Street, Hull
2, Other Street, Hull
3, Bibble Way, Leicester
4, Another Way, Leicester
5, High Street, Londidium
I'd like to get:
Hull, Leicester, Londidium.
Say your model is 'Shop'
class Shop(models.Model):
street = models.CharField(max_length=150)
city = models.CharField(max_length=150)
# some of your models may have explicit ordering
class Meta:
ordering = ('city',)
Since you may have the Meta class ordering attribute set (which is tuple or a list), you can use order_by() without parameters to clear any ordering when using distinct(). See the documentation under order_by()
If you don’t want any ordering to be applied to a query, not even the default ordering, call order_by() with no parameters.
and distinct() in the note where it discusses issues with using distinct() with ordering.
To query your DB, you just have to call:
models.Shop.objects.order_by().values('city').distinct()
It returns a dictionary
or
models.Shop.objects.order_by().values_list('city').distinct()
This one returns a ValuesListQuerySet which you can cast to a list.
You can also add flat=True to values_list to flatten the results.
See also: Get distinct values of Queryset by field
In addition to the still very relevant answer of jujule, I find it quite important to also be aware of the implications of order_by() on distinct("field_name") queries. This is, however, a Postgres only feature!
If you are using Postgres and if you define a field name that the query should be distinct for, then order_by() needs to begin with the same field name (or field names) in the same sequence (there may be more fields afterward).
Note
When you specify field names, you must provide an order_by() in the
QuerySet, and the fields in order_by() must start with the fields in
distinct(), in the same order.
For example, SELECT DISTINCT ON (a) gives you the first row for each
value in column a. If you don’t specify an order, you’ll get some
arbitrary row.
If you want to e.g. extract a list of cities that you know shops in, the example of jujule would have to be adapted to this:
# returns an iterable Queryset of cities.
models.Shop.objects.order_by('city').values_list('city', flat=True).distinct('city')
By example:
# select distinct code from Platform where id in ( select platform__id from Build where product=p)
pl_ids = Build.objects.values('platform__id').filter(product=p)
platforms = Platform.objects.values_list('code', flat=True).filter(id__in=pl_ids).distinct('code')
platforms = list(platforms) if platforms else []
If you don't use PostgreSQL and you just want to get and distinct with only one specific field you can use
MyModel.objects.values('name').annotate(Count('id')).order_by()
this queryset return us rows that their 'name' field is unique with the count of the rows that have same name
<QuerySet [{'name': 'a', 'id__count': 2}, {'name': 'b', 'id__count': 2}, {'name': 'c', 'id__count': 3}>
I know the returned data is not complete and sometimes is not satisfying but sometimes it's useful
document reference
The SELECT DISTINCT statement is used to return only distinct (different) values. Inside a table, a column often contains many duplicate values; using distinct() we can get Unique data.
event = Event.objects.values('item_event_type').distinct()
serializer= ItemEventTypeSerializer(event,many=True)
return Response(serializer.data)
Related
Assuming the following example model:
# models.py
class event(models.Model):
location = models.CharField(max_length=10)
type = models.CharField(max_length=10)
date = models.DateTimeField()
attendance = models.IntegerField()
I want to get the attendance number for the latest date of each event location and type combination, using Django ORM. According to the Django Aggregation documentation, we can achieve something close to this, using values preceding the annotation.
... the original results are grouped according to the unique combinations of the fields specified in the values() clause. An annotation is then provided for each unique group; the annotation is computed over all members of the group.
So using the example model, we can write:
event.objects.values('location', 'type').annotate(latest_date=Max('date'))
which does indeed group events by location and type, but does not return the attendance field, which is the desired behavior.
Another approach I tried was to use distinct i.e.:
event.objects.distinct('location', 'type').annotate(latest_date=Max('date'))
but I get an error
NotImplementedError: annotate() + distinct(fields) is not implemented.
I found some answers which rely on database specific features of Django, but I would like to find a solution which is agnostic to the underlying relational database.
Alright, I think this one might actually work for you. It is based upon an assumption, which I think is correct.
When you create your model object, they should all be unique. It seems highly unlikely that that you would have two events on the same date, in the same location of the same type. So with that assumption, let's begin: (as a formatting note, class Names tend to start with capital letters to differentiate between classes and variables or instances.)
# First you get your desired events with your criteria.
results = Event.objects.values('location', 'type').annotate(latest_date=Max('date'))
# Make an empty 'list' to store the values you want.
results_list = []
# Then iterate through your 'results' looking up objects
# you want and populating the list.
for r in results:
result = Event.objects.get(location=r['location'], type=r['type'], date=r['latest_date'])
results_list.append(result)
# Now you have a list of objects that you can do whatever you want with.
You might have to look up the exact output of the Max(Date), but this should get you on the right path.
With the following models:
class Item(models.Model):
name = models.CharField(max_length=255)
attributes = models.ManyToManyField(ItemAttribute)
class ItemAttribute(models.Model):
attribute = models.CharField(max_length=255)
string_value = models.CharField(max_length=255)
int_value = models.IntegerField()
I also have an Item which has 2 attributes, 'color': 'red', and 'size': 3.
If I do any of these queries:
Item.objects.filter(attributes__string_value='red')
Item.objects.filter(attributes__int_value=3)
I will get Item returned, works as I expected.
However, if I try to do a multiple query, like:
Item.objects.filter(attributes__string_value='red', attributes__int_value=3)
All I want to do is an AND. This won't work either:
Item.objects.filter(Q(attributes__string_value='red') & Q(attributes__int_value=3))
The output is:
<QuerySet []>
Why? How can I build such a query that my Item is returned, because it has the attribute red and the attribute 3?
If it's of any use, you can chain filter expressions in Django:
query = Item.objects.filter(attributes__string_value='red').filter(attributes__int_value=3')
From the DOCS:
This takes the initial QuerySet of all entries in the database, adds a filter, then an exclusion, then another filter. The final result is a QuerySet containing all entries with a headline that starts with “What”, that were published between January 30, 2005, and the current day.
To do it with .filter() but with dynamic arguments:
args = {
'{0}__{1}'.format('attributes', 'string_value'): 'red',
'{0}__{1}'.format('attributes', 'int_value'): 3
}
Product.objects.filter(**args)
You can also (if you need a mix of AND and OR) use Django's Q objects.
Keyword argument queries – in filter(), etc. – are “AND”ed together. If you need to execute more complex queries (for example, queries with OR statements), you can use Q objects.
A Q object (django.db.models.Q) is an object used to encapsulate a
collection of keyword arguments. These keyword arguments are specified
as in “Field lookups” above.
You would have something like this instead of having all the Q objects within that filter:
** import Q from django
from *models import Item
#assuming your arguments are kwargs
final_q_expression = Q(kwargs[1])
for arg in kwargs[2:..]
final_q_expression = final_q_expression & Q(arg);
result = Item.objects.filter(final_q_expression)
This is code I haven't run, it's out of the top of my head. Treat it as pseudo-code if you will.
Although, this doesn't answer why the ways you've tried don't quite work. Maybe it has to do with the lookups that span relationships, and the tables that are getting joined to get those values. I would suggest printing yourQuerySet.query to visualize the raw SQL that is being formed and that might help guide you as to why .filter( Q() & Q()) is not working.
I have a checkbox with multiple values. For example, hobbies. When the user choose some hobbies, he wants to see the person who has all those hobbies.
I dont go into details for sake of clarity so here's how I read all the checked values and put them in my "big" query, which is, as you may guess: q:
hobbies2 = [int(l) for l in g.getlist('hobbies2')]
if len(hobbies2):
q = q & Q(personne__hobbies2__in=hobbies2)
The problem is that it returns all persons who have a hobby in common (i.e. it's like "or", not "and").
How to get all those many to many values with a and?
The __in operator is for filtering by a set. It is not meant for checking multiple values at once in a many-to-many relationship.
It literally translate to the following in SQL:
SELECT ... WHERE hobbies IN (1, 3, 4);
So you'd need to build your query using AND conditions. You can do this with consecutive filters:
queryset = Personne.objects.all()
# build up the queryset with multiple filters
for hobby in hobbies2:
queryset = queryset.filter(hobbies2=hobby)
I have two tables like so:
class Collection(models.Model):
name = models.CharField()
class Image(models.Model):
name = models.CharField()
image = models.ImageField()
collection = models.ForeignKey(Collection)
I'd like to retrieve the first image out of every collection. I have attempted:
image_list = Image.objects.order_by('collection.id').distinct('collection.id')
but it didn't work out the way I expected :(
Any ideas?
Thanks.
Don't use dots to separate fields that span relations in Django; the double-underscore convention is used instead -- it means "follow this relation to get to this field"
this is more correct:
image_list = Image.objects.order_by('collection__id').distinct('collection__id')
However, it probably doesn't do what you want.
The concept of "first" doesn't always apply in relational databases the way you seem to be using it. For all of the records in the image table with the same collection id, there is no record which is 'first' or 'last' -- they're all just records. You could put another field on that table to define a specific order, or you could order by id, or alphabetically by name, but none of those will happen by default.
What will probably work best for you is to get the list of collections with one query, and then get a single item per collection, in separate queries:
collection_ids = Image.objects.values_list('collection', flat=True).distinct()
image_list = [
Image.objects.filter(collection__id=c)[0] for c in collection_ids
]
If you want to apply an order to the Images, to define which is 'first', then modify it like this:
collection_ids = Image.objects.values_list('collection', flat=True).distinct()
image_list = [
Image.objects.filter(collection__id=c).order_by('-id')[0] for c in collection_ids
]
You could also write raw SQL -- MySQL aggregation has the interesting property that fields which are not aggregated over can still appear in the final output, and essentially take a random value from the set of matching records. Something like this might work:
Image.objects.raw("SELECT image.* FROM app_image GROUP BY collection_id")
This query should get you one image from each collection, but you will have no control over which one is returned.
As written in my comment, you cannot use specific fields with distinct under MySQL. However, you can achieve the same result with the following:
from itertools import groupby
all_images = Image.objects.order_by('collection__id')
images_by_collection = groupby(all_images, lambda image: image.collection_id)
image_list = sum([group for key, group in images_by_collection], [])
Unfortunately, this results in a "bigger" query to the DB (all images are retrieved).
dict([(c.id, c.image_set.all()[0]) for c in Collection.objects.all()])
That will create a dictionary of the first image (by default ordering) in each collection, keyed by the collection's id. Be aware, though, that this will generate 1+N queries, where N is the total number of collection objects.
To get around that, you'll either need to wait for Django 1.4 and prefetch_related or use something like django-batch-select.
First get the distinct result, then do your filters.
I think you should try this one.
image_list = Image.objects.distinct()
image_list = image_list.order_by('collection__id')
I have a two models:
class Category(models.Model):
pass
class Item(models.Model):
cat = models.ForeignKey(Category)
I am trying to return all Categories for which all of that category's items belong to a given subset of item ids (fixed thanks). For example, all categories for which all of the items associated with that category have ids in the set [1,3,5].
How could this be done using Django's query syntax (as of 1.1 beta)? Ideally, all the work should be done in the database.
Category.objects.filter(item__id__in=[1, 3, 5])
Django creates the reverse relation ship on the model without the foreign key. You can filter on it by using its related name (usually just the model name lowercase but it can be manually overwritten), two underscores, and the field name you want to query on.
lets say you require all items to be in the following set:
allowable_items = set([1,3,4])
one bruteforce solution would be to check the item_set for every category as so:
categories_with_allowable_items = [
category for category in
Category.objects.all() if
set([item.id for item in category.item_set.all()]) <= allowable_items
]
but we don't really have to check all categories, as categories_with_allowable_items is always going to be a subset of the categories related to all items with ids in allowable_items... so that's all we have to check (and this should be faster):
categories_with_allowable_items = set([
item.category for item in
Item.objects.select_related('category').filter(pk__in=allowable_items) if
set([siblingitem.id for siblingitem in item.category.item_set.all()]) <= allowable_items
])
if performance isn't really an issue, then the latter of these two (if not the former) should be fine. if these are very large tables, you might have to come up with a more sophisticated solution. also if you're using a particularly old version of python remember that you'll have to import the sets module
I've played around with this a bit. If QuerySet.extra() accepted a "having" parameter I think it would be possible to do it in the ORM with a bit of raw SQL in the HAVING clause. But it doesn't, so I think you'd have to write the whole query in raw SQL if you want the database doing the work.
EDIT:
This is the query that gets you part way there:
from django.db.models import Count
Category.objects.annotate(num_items=Count('item')).filter(num_items=...)
The problem is that for the query to work, "..." needs to be a correlated subquery that looks up, for each category, the number of its items in allowed_items. If .extra had a "having" argument, you'd do it like this:
Category.objects.annotate(num_items=Count('item')).extra(having="num_items=(SELECT COUNT(*) FROM app_item WHERE app_item.id in % AND app_item.cat_id = app_category.id)", having_params=[allowed_item_ids])