Django Query (aggregates and counts) - django

Hey guys, I've got a model that looks like this:
class Interaction(DateAwareModel, UserAwareModel):
page = models.ForeignKey(Page)
container = models.ForeignKey(Container, blank=True, null=True)
content = models.ForeignKey(Content)
interaction_node = models.ForeignKey(InteractionNode)
kind = models.CharField(max_length=3, choices=INTERACTION_TYPES)
I want to be able to do one query to get the count of the interactions grouped by container then by kind. The idea being that the output JSON data structure (serialization taken care of by piston) would look like this:
"data": {
"container 1": {
"tag_count": 3,
"com_count": 1
},
"container 2": {
"tag_count": 7,
"com_count": 12
},
...
}
The SQL would look like this:
SELECT container_id, kind, count(*) FROM rb_interaction GROUP BY container_id, kind;
Any ideas on how to group by multiple fields using the ORM? (I don't want to write raw queries for this project if I can avoid id) This seems like a simple and common query.
Before you ask: I have seen the django aggregates documentation and the raw queries documentation.
Update
As per advice below I've created a custom manager to handle this:
class ContainerManager(models.Manager):
def get_query_set(self, *args, **kwargs):
qs = super(ContainerManager, self).get_query_set(*args, **kwargs)
qs.filter(Q(interaction__kind='tag') | Q(interaction__kind='com')).distinct()
annotations = {
'tag_count':models.Count('interaction__kind'),
'com_count':models.Count('interaction__kind')
}
return qs.annotate(**annotations)
This only counts the interactions that are of kind tag or com instead of retrieving the counts of tags and of the coms via group by. It is obvious that it works that way from the code but wondering how to fix it...

Create a custom manager:
class ContainerManager(models.Manager):
def get_query_set(self, *args, **kwargs):
qs = super(ContainerManager, self).get_query_set(*args, **kwargs)
annotations = {'tag_count':models.Count('tag'), 'com_count':models.Count('com')}
return qs.annotate(**annotations)
class Container(models.Model):
...
objects = ContainerManager()
Then, Container queries will always include tag_count and com_count attributes. You'll probably need to modify the annotations, since I don't have a copy of your model to refer to; I just guessed on the field names.
UPDATE:
So after gaining a better understanding of your models, annotations won't work for what you're looking for. Really the only to get counts for how many Containers have kinds of 'tag' or 'com' is:
tag_count = Container.objects.filter(kind='tag').count()
com_count = Container.objects.filter(kind='com').count()
Annotations won't give you that information. I think it's possible to write your own aggregates and annotations, so that might be a possible solution. However, I've never done that myself, so I can't really give you any guidance there. You're probably stuck with using straight SQL.

Related

Conditionals in django models arguments

quick question. Does anyone have any idea how to write conditionals in django models?
For example I have this code here:
class Trip(models.Model):
tripName = models.CharField(max_length=64)
tripLogo = models.ImageField(default='default_trip.jpg', upload_to='trip_pics')
So here default value is 'default_trip.jpg', but I'd like to write a conditional that if tripName == "russian" than default=russia.jpg. Maybe not change default, but another image will be initiated.
This is not something that can be done on the model level, it must be done in the controller (otherwise, this would break the MVC pattern).
Keep in mind that Django's ORM wrapper must turn your model class into a usable table in whatever the underlaying database engine is. This type of "conditional default" is not part of any database engine that I know of.
default arg can be a calable.
def contact_default():
return {"email": "to1#example.com"}
contact_info = JSONField("ContactInfo", default=contact_default)
read this
So this part of code helped me to solve my problem.
def save(self, *args, **kwargs):
tripName = getattr(self, 'tripName')
if tripName in tripImages:
self.tripLogo = "{}.png".format(tripName.lower())
else:
self.tripLogo = "default_trip.png"

Returning distinct objects sorted by SearchRank involving multiple tables

I have a model class called Poll which has title, description, and tags fields. I also have a PollEntry class (related_name=entries) which also has title, description, and tags. I am trying to implement text search (using contrib.postgres.search) module. There are apparently issues with returning a search ordered by rank returning duplicate objects and the Django documentation says basically "You have to be careful", but does not give any examples of how to deal with this, and I have had basically no luck finding examples online (on SO or elsewhere).
The following code snippet appears to solve this problem, but I don't know if it is the most efficient way to do this. Any suggestions or references would be much appreciated! Also, note I am using DRF here.
#list_route(methods=['get'])
def search(self, request):
search_query_terms = request.query_params.get('searchQuery').split(' ')
search_vector = SearchVector('entries__tags__name')+\
SearchVector('title')+\
SearchVector('description')+\
SearchVector('tags__name')+\
SearchVector('entries__title')+\
SearchVector('entries__description')
search_query = SearchQuery(search_query_terms[0])
for term in search_query_terms[1:]:
search_query = SearchQuery(term) | search_query
ids = Poll.objects\
.annotate(rank=SearchRank(search_vector, search_query))\
.order_by('-rank')\
.filter(rank__gt=0)\
.values_list('id')
polls = Poll.objects.filter(id__in=ids)
serializer = self.get_serializer(polls, many=True)
return Response(
data=serializer.data
)

Join annotations in Django without raw SQL

I have a model that has arbitrary key/value pairs (attributes) associated with it. I'd like to have the option of sorting by those dynamic attributes. Here's what I came up with:
class Item(models.Model):
pass
class Attribute(models.Model):
item = models.ForeignKey(Item, related_name='attributes')
key = models.CharField()
value = models.CharField()
def get_sorted_items():
return Item.objects.all().annotate(
first=models.select_attribute('first'),
second=models.select_attribute('second'),
).order_by('first', 'second')
def select_attribute(attribute):
return expressions.RawSQL("""
select app_attribute.value from app_attribute
where app_attribute.item_id = app_item.id
and app_attribute.key = %s""", (attribute,))
This works, but it has a bit of raw SQL in it, so it makes my co-workers wary. Is it possible to do this without raw SQL? Can I make use of Django's ORM to simplify this?
I would expect something like this to work, but it doesn't:
def get_sorted_items():
return Item.objects.all().annotate(
first=Attribute.objects.filter(key='first').values('value'),
second=Attribute.objects.filter(key='second').values('value'),
).order_by('first', 'second')
Approach 1
Using Djagno 1.8+ Conditional Expressions
(see also Query Expressions)
items = Item.objects.all().annotate(
first=models.Case(models.When(attribute__key='first', then=models.F('attribute__value')), default=models.Value('')),
second=models.Case(models.When(attribute__key='second', then=models.F('attribute__value')), default=models.Value(''))
).distinct()
for item in items:
print item.first, item.second
Approach 2
Using prefetch_related with custom models.Prefetch object
keys = ['first', 'second']
items = Item.objects.all().prefetch_related(
models.Prefetch('attributes',
queryset=Attribute.objects.filter(key__in=keys),
to_attr='prefetched_attrs'),
)
This way every item from the queryset will contain a list under the .prefetched_attrs attribute.
This list will contains all filtered-item-related attributes.
Now, because you want to get the attribute.value, you can implement something like this:
class Item(models.Model):
#...
def get_attribute(self, key, default=None):
try:
return next((attr.value for attr in self.prefetched_attrs if attr.key == key), default)
except AttributeError:
raise AttributeError('You didnt prefetch any attributes')
#and the usage will be:
for item in items:
print item.get_attribute('first'), item.get_attribute('second')
Some notes about the differences in using both approaches.
you have a one idea better control over the filtering process using the approach with the custom Prefetch object. The conditional-expressions approach is one idea harder to be optimized IMHO.
with prefetch_related you get the whole attribute object, not just the value you are interested in.
Django executes prefetch_related after the queryset is being evaluated, which means a second query is being executed for each clause in the prefetch_related call. On one way this can be good, because it this keeps the main queryset untouched from the filters and thus not additional clauses like .distinct() are needed.
prefetch_related always put the returned objects into a list, its not very convenient to use when you have prefetchs returning 1 element per object. So additional model methods are required in order to use with pleasure.

django distinct query using custom equivalence

Say that my model looks like this:
class Alert(models.Model):
datetime_alert = models.DateTimeField()
alert_type = models.ForeignKey(Alert_Type, on_delete=models.CASCADE)
dismissed = models.BooleanField(default=False)
datetime_dismissed = models.DateTimeField(null=True)
auid = models.CharField(max_length=64, unique=True)
entities = models.ManyToManyField(to='Entity', through='Entity_To_Alert_Map')
objects = Alert_Manager()
def __eq__(self, other):
return isinstance(other,
self.__class__) and self.alert_type == other.alert_type and \
self.entities.all() == other.entities().all() and self.dismissed == other.dismissed
def __ne__(self, other):
return not self.__eq(other)
what I'm trying to accomplish is say this: two alert objects are equivalent if the dismissed status, alert type, and the associated entities are the same. Using this idea, is it possible to write a query to ask for all the distinct alerts based off that criteria? Selecting all of them and then filtering them out doesn't seem appealing.
You mention one method to do it, and I don't think it is very bad. I'm not aware of anything in Django that can do this.
However, I want you to think why this problem arises? If two alerts are equal if message, status and type is the same, then maybe this should be it's own class. I would consider creating another class DistinctAlert (or some better name) and have a foreign key to this class from Alert. Or even better, have one class that is Alert, and one that is called AlertEvent(your Alert class).
Would this solve your problem?
Edit:
Actually, there is a way to do this. You can combine values() and distinct(). This way, your query will be
Alert.objects.all().values("alert_type", "dismissed", "entities").distinct()
This will return a dictionary.
See more in the documentation of values()

Django get_next_by_FIELD using complex Q lookups

While creating a front end for a Django module I faced the following problem inside Django core:
In order to display a link to the next/previous object from a model query, we can use the extra-instance-methods of a model instance: get_next_by_FIELD() or get_previous_by_FIELD(). Where FIELD is a model field of type DateField or DateTimeField.
Lets explain it with an example
from django.db import models
class Shoe(models.Model):
created = models.DateTimeField(auto_now_add=True, null=False)
size = models.IntegerField()
A view to display a list of shoes, excluding those where size equals 4:
def list_shoes(request):
shoes = Shoe.objects.exclude(size=4)
return render_to_response(request, {
'shoes': shoes
})
And let the following be a view to display one shoe and the corresponding
link to the previous and next shoe.
def show_shoe(request, shoe_id):
shoe = Shoe.objects.get(pk=shoe_id)
prev_shoe = shoe.get_previous_by_created()
next_shoe = shoe.get_next_by_created()
return render_to_response('show_shoe.html', {
'shoe': shoe,
'prev_shoe': prev_shoe,
'next_shoe': next_shoe
})
Now I have the situation that the show_shoe view displays the link to the previous/next regardless of the shoes size. But I actually wanted just shoes whose size is not 4.
Therefore I tried to use the **kwargs argument of the get_(previous|next)_by_created() methods to filter out the unwanted shoes, as stated by the documentation:
Both of these methods will perform their queries using the default manager for the model. If you need to emulate filtering used by a custom manager, or want to perform one-off custom filtering, both methods also accept
optional keyword arguments, which should be in the format described in Field lookups.
Edit: Keep an eye on the word "should", because then also (size_ne=4) should work, but it doesn't.
The actual problem
Filtering using the lookup size__ne ...
def show_shoe(request, shoe_id):
...
prev_shoe = shoe.get_previous_by_created(size__ne=4)
next_shoe = shoe.get_next_by_created(size__ne=4)
...
... didn't work, it throws FieldError: Cannot resolve keyword 'size_ne' into field.
Then I tried to use a negated complex lookup using Q objects:
from django.db.models import Q
def show_shoe(request, shoe_id):
...
prev_shoe = shoe.get_previous_by_created(~Q(size=4))
next_shoe = shoe.get_next_by_created(~Q(size=4))
...
... didn't work either, throws TypeError: _get_next_or_previous_by_FIELD() got multiple values for argument 'field'
Because the get_(previous|next)_by_created methods only accept **kwargs.
The actual solution
Since these instance methods use the _get_next_or_previous_by_FIELD(self, field, is_next, **kwargs) I changed it to accept positional arguments using *args and passed them to the filter, like the **kwargs.
def my_get_next_or_previous_by_FIELD(self, field, is_next, *args, **kwargs):
"""
Workaround to call get_next_or_previous_by_FIELD by using complext lookup queries using
Djangos Q Class. The only difference between this version and original version is that
positional arguments are also passed to the filter function.
"""
if not self.pk:
raise ValueError("get_next/get_previous cannot be used on unsaved objects.")
op = 'gt' if is_next else 'lt'
order = '' if is_next else '-'
param = force_text(getattr(self, field.attname))
q = Q(**{'%s__%s' % (field.name, op): param})
q = q | Q(**{field.name: param, 'pk__%s' % op: self.pk})
qs = self.__class__._default_manager.using(self._state.db).filter(*args, **kwargs).filter(q).order_by('%s%s' % (order, field.name), '%spk' % order)
try:
return qs[0]
except IndexError:
raise self.DoesNotExist("%s matching query does not exist." % self.__class__._meta.object_name)
And calling it like:
...
prev_shoe = shoe.my_get_next_or_previous_by_FIELD(Shoe._meta.get_field('created'), False, ~Q(state=4))
next_shoe = shoe.my_get_next_or_previous_by_FIELD(Shoe._meta.get_field('created'), True, ~Q(state=4))
...
finally did it.
Now the question to you
Is there an easier way to handle this? Should shoe.get_previous_by_created(size__ne=4) work as expected or should I report this issue to the Django guys, in the hope they'll accept my _get_next_or_previous_by_FIELD() fix?
Environment: Django 1.7, haven't tested it on 1.9 yet, but the code for _get_next_or_previous_by_FIELD() stayed the same.
Edit: It is true that complex lookups using Q object is not part of "field lookups", it's more part of the filter() and exclude() functions instead. And I am probably wrong when I suppose that get_next_by_FIELD should accept Q objects too. But since the changes involved are minimal and the advantage to use Q object is high, I think these changes should get upstream.
tags: django, complex-lookup, query, get_next_by_FIELD, get_previous_by_FIELD
(listing tags here, because I don't have enough reputations.)
You can create custom lookup ne and use it:
.get_next_by_created(size__ne=4)
I suspect the method you've tried first only takes lookup arg for the field you're basing the get_next on. Meaning you won't be able to access the size field from the get_next_by_created() method, for example.
Edit : your method is by far more efficient, but to answer your question on the Django issue, I think everything is working the way it is supposed to. You could offer an additional method such as yours but the existing get_next_by_FIELD is working as described in the docs.
You've managed to work around this with a working method, which is OK I guess, but if you wanted to reduce the overhead, you could try a simple loop :
def get_next_by_field_filtered(obj, field=None, **kwargs):
next_obj = getattr(obj, 'get_next_by_{}'.format(field))()
for key in kwargs:
if not getattr(next_obj, str(key)) == kwargs[str(key)]:
return get_next_by_field_filtered(next_obj, field=field, **kwargs)
return next_obj
This isn't very efficient but it's one way to do what you want.
Hope this helps !
Regards,