How to count the number of keys in a django jsonfield

How to count the number of keys in a django jsonfield - django

I would like to make a queryset filter to identify all model instance that have a given number of keys in a jsonfield of the model.
I have tried to create a custom lookup (#1) to extra the keys in the json field,
and would like to aggregate those into an array and use the __len lookup of the array to make my filter. Unfortunately, I am stuck at the aggregation that doesn't seem to work (#2).
1
class JsonKeys(Transform):
output_field = TextField()
lookup_name = 'keys'
function = 'jsonb_object_keys'
2
qs = qs.annotate(keysArray=ArrayAgg("myJsonField__keys"))
The error that I get is:
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
Anyone has already tried to perform this task ?

Related

How to apply nulls_last ordering to an extra select field in Django?

I'm trying to apply order_by() with nulls_last=True to a queryset:
MyModel.obects.all().order_by(*[models.F('field').desc(nulls_last=True)])
This works fine for regular fields, but it throws an error when I try to order by an extra field in the same manner:
queryset = MyModel.obects.extra(**{'select': {'extra_field': 'custom_sql'}}).all()
queryset = queryset.order_by(*[models.F('extra_field').desc(nulls_last=True)])
The problem is it is unable to find extra_field column when using models.F(). If I simply use order_by('extra_field') as a string it works because Django is doing some extra preprocessing and handles extra fields differently. After digging through the sources I was able to make it work as:
queryset = MyModel.obects.extra(**{'select': {'extra_field': 'custom_sql'}}).all()
queryset = queryset.order_by(*[models.expressions.Ref('extra_field', models.expressions.RawSQL('custom_sql', [])).desc(nulls_last=True)])
The downside of this solution is that I have to treat regular fields and extra fields differently, and I have to repeat the custom_query inside order_by.
I am trying to build a universal function that would simply take a list of fields as stings (how Django would take) and wrap them with nulls_last.
The question is: Given only a field name as a string and a queryset, is there a universal way to apply nulls_last to it without specifying if this is a native field or an extra, and without passing a custom query for an extra field? I am trying to make the function signature to be something like:
def apply_order_by_nulls_last(queryset, field:str):
#detect field type and convert it to OrderBy expression with nulls_last
...
return queryset.order_by(...)

Get models in Django that have all of the values in ManyToMany field (AND-query, no reverse lookups allowed)

I have such a model in Django:
class VariantTag(models.Model):
saved_variants = models.ManyToManyField('SavedVariant')
I need to get all VariantTag models that have saved_variants ManyToMany field with exact ids, say (250, 251), no more, no less. By the nature of the code that I am dealing with there is no way I can do reverse lookup with _set. So, I am looking for a query (or several queries + additional python code filtering) that will get me there but in such a way:
query = Q(...)
tag_queryset = VariantTag.objects.filter(query)
How is it possible to achieve?
I should probably stress out: supplied saved variants (e.g. (250, 251) should be AND - ed, not OR - ed.

Use in lookup
tag_queryset = VariantTag.objects.filter(saved_variants__in=[250,251])

So far I was able to achieve AND result by the following code:
tag_ids = VariantTag.objects.filter(variant_tag_type__name=tag_data['tag'],
saved_variants__in=saved_variant_ids).values_list('id', flat=True).distinct()
for tag_id in tag_ids:
saved_variants = list(VariantTag.objects.get(id=tag_id).saved_variants.all().values_list('id', flat=True))
if all(s in saved_variant_ids for s in saved_variants) and len(saved_variants) == len(saved_variant_ids):
return VariantTag.objects.get(id=tag_id)
So, I am doing the following:
Getting the OR - result
Iterating over the resulting ids of the retrieved model and for each one of them getting all of the ids of the ManyToMany field
Checking if all of the obtained ids of the ManyToMany field are in the required ids list (saved_variant_ids)
If yes - get the model by the id: VariantTag.objects.get(id=tag_id)
In my case there will be only one such model that have the required ids in ManyToMany field. If it is not the case for you - just append the ids of the model (in my case tag_id) to a list - then make a query for all of them.
If anyone has more concise way of doing AND ManyToMany query + code, would be interesting to see.

Return object when aggregating grouped fields in Django

Assuming the following example model:
# models.py
class event(models.Model):
location = models.CharField(max_length=10)
type = models.CharField(max_length=10)
date = models.DateTimeField()
attendance = models.IntegerField()
I want to get the attendance number for the latest date of each event location and type combination, using Django ORM. According to the Django Aggregation documentation, we can achieve something close to this, using values preceding the annotation.
... the original results are grouped according to the unique combinations of the fields specified in the values() clause. An annotation is then provided for each unique group; the annotation is computed over all members of the group.
So using the example model, we can write:
event.objects.values('location', 'type').annotate(latest_date=Max('date'))
which does indeed group events by location and type, but does not return the attendance field, which is the desired behavior.
Another approach I tried was to use distinct i.e.:
event.objects.distinct('location', 'type').annotate(latest_date=Max('date'))
but I get an error
NotImplementedError: annotate() + distinct(fields) is not implemented.
I found some answers which rely on database specific features of Django, but I would like to find a solution which is agnostic to the underlying relational database.

Alright, I think this one might actually work for you. It is based upon an assumption, which I think is correct.
When you create your model object, they should all be unique. It seems highly unlikely that that you would have two events on the same date, in the same location of the same type. So with that assumption, let's begin: (as a formatting note, class Names tend to start with capital letters to differentiate between classes and variables or instances.)
# First you get your desired events with your criteria.
results = Event.objects.values('location', 'type').annotate(latest_date=Max('date'))
# Make an empty 'list' to store the values you want.
results_list = []
# Then iterate through your 'results' looking up objects
# you want and populating the list.
for r in results:
result = Event.objects.get(location=r['location'], type=r['type'], date=r['latest_date'])
results_list.append(result)
# Now you have a list of objects that you can do whatever you want with.
You might have to look up the exact output of the Max(Date), but this should get you on the right path.

Django models: retrieving unique foreign key instances

I have two tables like so:
class Collection(models.Model):
name = models.CharField()
class Image(models.Model):
name = models.CharField()
image = models.ImageField()
collection = models.ForeignKey(Collection)
I'd like to retrieve the first image out of every collection. I have attempted:
image_list = Image.objects.order_by('collection.id').distinct('collection.id')
but it didn't work out the way I expected :(
Any ideas?
Thanks.

Don't use dots to separate fields that span relations in Django; the double-underscore convention is used instead -- it means "follow this relation to get to this field"
this is more correct:
image_list = Image.objects.order_by('collection__id').distinct('collection__id')
However, it probably doesn't do what you want.
The concept of "first" doesn't always apply in relational databases the way you seem to be using it. For all of the records in the image table with the same collection id, there is no record which is 'first' or 'last' -- they're all just records. You could put another field on that table to define a specific order, or you could order by id, or alphabetically by name, but none of those will happen by default.
What will probably work best for you is to get the list of collections with one query, and then get a single item per collection, in separate queries:
collection_ids = Image.objects.values_list('collection', flat=True).distinct()
image_list = [
Image.objects.filter(collection__id=c)[0] for c in collection_ids
]
If you want to apply an order to the Images, to define which is 'first', then modify it like this:
collection_ids = Image.objects.values_list('collection', flat=True).distinct()
image_list = [
Image.objects.filter(collection__id=c).order_by('-id')[0] for c in collection_ids
]
You could also write raw SQL -- MySQL aggregation has the interesting property that fields which are not aggregated over can still appear in the final output, and essentially take a random value from the set of matching records. Something like this might work:
Image.objects.raw("SELECT image.* FROM app_image GROUP BY collection_id")
This query should get you one image from each collection, but you will have no control over which one is returned.

As written in my comment, you cannot use specific fields with distinct under MySQL. However, you can achieve the same result with the following:
from itertools import groupby
all_images = Image.objects.order_by('collection__id')
images_by_collection = groupby(all_images, lambda image: image.collection_id)
image_list = sum([group for key, group in images_by_collection], [])
Unfortunately, this results in a "bigger" query to the DB (all images are retrieved).

dict([(c.id, c.image_set.all()[0]) for c in Collection.objects.all()])
That will create a dictionary of the first image (by default ordering) in each collection, keyed by the collection's id. Be aware, though, that this will generate 1+N queries, where N is the total number of collection objects.
To get around that, you'll either need to wait for Django 1.4 and prefetch_related or use something like django-batch-select.

First get the distinct result, then do your filters.
I think you should try this one.
image_list = Image.objects.distinct()
image_list = image_list.order_by('collection__id')

Django DB, finding Categories whose Items are all in a subset

I have a two models:
class Category(models.Model):
pass
class Item(models.Model):
cat = models.ForeignKey(Category)
I am trying to return all Categories for which all of that category's items belong to a given subset of item ids (fixed thanks). For example, all categories for which all of the items associated with that category have ids in the set [1,3,5].
How could this be done using Django's query syntax (as of 1.1 beta)? Ideally, all the work should be done in the database.

Category.objects.filter(item__id__in=[1, 3, 5])
Django creates the reverse relation ship on the model without the foreign key. You can filter on it by using its related name (usually just the model name lowercase but it can be manually overwritten), two underscores, and the field name you want to query on.

lets say you require all items to be in the following set:
allowable_items = set([1,3,4])
one bruteforce solution would be to check the item_set for every category as so:
categories_with_allowable_items = [
category for category in
Category.objects.all() if
set([item.id for item in category.item_set.all()]) <= allowable_items
]
but we don't really have to check all categories, as categories_with_allowable_items is always going to be a subset of the categories related to all items with ids in allowable_items... so that's all we have to check (and this should be faster):
categories_with_allowable_items = set([
item.category for item in
Item.objects.select_related('category').filter(pk__in=allowable_items) if
set([siblingitem.id for siblingitem in item.category.item_set.all()]) <= allowable_items
])
if performance isn't really an issue, then the latter of these two (if not the former) should be fine. if these are very large tables, you might have to come up with a more sophisticated solution. also if you're using a particularly old version of python remember that you'll have to import the sets module

I've played around with this a bit. If QuerySet.extra() accepted a "having" parameter I think it would be possible to do it in the ORM with a bit of raw SQL in the HAVING clause. But it doesn't, so I think you'd have to write the whole query in raw SQL if you want the database doing the work.
EDIT:
This is the query that gets you part way there:
from django.db.models import Count
Category.objects.annotate(num_items=Count('item')).filter(num_items=...)
The problem is that for the query to work, "..." needs to be a correlated subquery that looks up, for each category, the number of its items in allowed_items. If .extra had a "having" argument, you'd do it like this:
Category.objects.annotate(num_items=Count('item')).extra(having="num_items=(SELECT COUNT(*) FROM app_item WHERE app_item.id in % AND app_item.cat_id = app_category.id)", having_params=[allowed_item_ids])

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to count the number of keys in a django jsonfield - django

Related

How to apply nulls_last ordering to an extra select field in Django?

Get models in Django that have all of the values in ManyToMany field (AND-query, no reverse lookups allowed)

Return object when aggregating grouped fields in Django

Django models: retrieving unique foreign key instances

Django DB, finding Categories whose Items are all in a subset

Categories

Resources