Comparing JSONFields in Django - django

If two models both have JSONFields, is there a way to match one against the other? Say I have two models:
Crabadoodle(Model):
classification = CharField()
metadata = JSONField()
Glibotz(Model):
rating = IntegerField()
metadata = JSONField()
If I have a Crabadoodle and want to fetch all the Glibotz objects with identical metadata fields, how would I go about that? If I know specific contents, I can filter simple enough, but how do you go about matching on the whole field?

There is no implementation of this in Django but it is possible by performing raw query using jsonb operators(#>,<#)
Something in line of following
select *
from someapp_crabdoodle crab
join someapp_glizbotz glib
on crab.metadata #> glib.metadata and crab.metadata <# glib.metadata
where crab.id = 1

Related

Select all foreign keys from the foreign key table in django

Let's imagine I have 2 models:
Class Tree(models.Model):
title = models.CharField(max_length=255)
Class Apple(models.Model):
tree = models.ForeignKey(Tree, related_name="apples")
How do I select all the Trees that have Apples.
I mean I want to select all the Trees that exist in Apple Model from an instance of Tree.
I think I want to execute this query:
SELECT DISTINCT tree.id, tree.title
FROM apple JOIN tree ON apple.tree = tree.id
Untill now i have written 2 queries and they are working but I think they are not the best practices to do it:
Tree.objects.filter(
apples__tree__in=Apple.objects.all().values_list("tree")
).distinct()
Tree.objects.filter(apples__tree__isnull=False).distinct()
You can query the relation for 'NULL' directly
Trees.objects.filter(apples__isnull=False).distinct()
P.S. If you want the exact query, you can write it like this (but you'll only get the dictionaries, not a Tree object):
Apple.objects.order_by().values('tree__id', 'tree__title').distinct()
You can use django aggregates.
from django.db.models import Count
User.objects.annotate(page_count=Count('page')).filter(page_count__gte=2).count()

Django - joining multiple tables (models) and filtering out based on their attribute

I'm new to django and ORM in general, and so have trouble coming up with query which would join multiple tables.
I have 4 Models that need joining - Category, SubCategory, Product and Packaging, example values would be:
Category: 'male'
SubCategory: 'shoes'
Product: 'nikeXYZ'
Packaging: 'size_36: 1'
Each of the Model have FK to the model above (ie. SubCategory has field category etc).
My question is - how can I filter Product given a Category (e.g. male) and only show products which have Packaging attribute available set to True? Obviously I want to minimise the hits on my database (ideally do it with 1 SQL query).
I could do something along these lines:
available = Product.objects.filter(packaging__available=True)
subcategories = SubCategory.objects.filter(category_id=<id_of_male>)
products = available.filter(subcategory_id__in=subcategories)
but then that requires 2 hits on database at least (available, subcategories) I think. Is there a way to do it in one go?
try this:
lookup = {'packaging_available': True, 'subcategory__category_id__in': ['ids of males']}
product_objs = Product.objects.filter(**lookup)
Try to read:
this
You can query with _set, multi __ (to link models by FK) or create list ids
I think this should work but it's not tested:
Product.objects.filter(packaging__available=True,subcategori‌​es__category_id__in=‌​[id_of_male])
it isn't tested but I think that subcategories should be plural (related_name), if you didn't set related_name, then subcategory__set instead od subcategories should work.
Probably subcategori‌​es__category_id__in=‌​[id_of_male] can be switched to .._id=id_of_male.

Django ORM: django aggregate over filtered reverse relation

The question is remotely related to Django ORM: filter primary model based on chronological fields from related model, by further limiting the resulting queryset.
The models
Assuming we have the following models:
class Patient(models.Model)
name = models.CharField()
# other fields following
class MedicalFile(model.Model)
patient = models.ForeignKey(Patient, related_name='files')
issuing_date = models.DateField()
expiring_date = models.DateField()
diagnostic = models.CharField()
The query
I need to select all the files which are valid at a specified date, most likely from the past. The problem that I have here is that for every patient, there will be a small overlapping period where a patient will have 2 valid files. If we're querying for a date from that small timeframe, I need to select only the most recent file.
More to the point: consider patient John Doe. he will have string of "uninterrupted" files starting with 2012 like this:
+---+------------+-------------+
|ID |issuing_date|expiring_date|
+---+------------+-------------+
|1 |2012-03-06 |2013-03-06 |
+---+------------+-------------+
|2 |2013-03-04 |2014-03-04 |
+---+------------+-------------+
|3 |2014-03-04 |2015-03-04 |
+---+------------+-------------+
As one can easily observe, there is an overlap of couple of days of the validity of these files. For instance, in 2013-03-05 the files 1 and 2 are valid, but we're considering only file 2 (as the most recent one). I'm guessing that the use case isn't special: this is the case of managing subscriptions, where in order to have a continuous subscription, you will renew your subscription earlier.
Now, in my application I need to query historical data, e.g. give me all the files which where valid at 2013-03-05, considering only the "most recent" ones. I was able to solve this by using RawSQL, but I would like to have a solution without raw SQL. In the previous question, we were able to filter the "latest" file by aggregation over the reverse relation, something like:
qs = MedicalFile.objects.annotate(latest_file_date=Max('patient__files__issuing_date'))
qs = qs.filter(issuing_date=F('latest_file_date')).select_related('patient')
The problem is that we need to limit the range over which latest_file_date is computed, by filtering against 2013-03-05. But aggregate function don't run over filtered querysets ...
The "poor" solution
I'm currently doing this via an extra queryset clause (substitute "app" with your concrete application):
reference_date = datetime.date(year=2013, month=3, day=5)
annotation_latest_issuing_date = {
'latest_issuing_date': RawSQL('SELECT max(file.issuing_date) '
'FROM <app>_medicalfile file '
'WHERE file.person_id = <app>_medicalfile.person_id '
' AND file.issuing_date <= %s', (reference_date, ))
}
qs = MedicalFile.objects.filter(expiring_date__gt=reference_date, issuing_date__lte=reference_date)
qs = qs.extra(**annotation_latest_issuing_date).filter(issuing_date=F('latest_issuing_date'))
Writen as such, the queryset returns correct number of records.
Question: how can it be achieved without RaWSQL and (already implied) with the same performance level ?
You can use id__in and provide your nested filtered queryset (like all files that are valid at the given date).
qs = MedicalFile.objects
.filter(id__in=self.filter(expiring_date__gt=reference_date, issuing_date__lte=reference_date))
.order_by('patient__pk', '-issuing_date')
.distinct('patient__pk') # field_name parameter only supported by Postgres
The order_by groups the files by patient, with the latest issuing date first. distinct then retrieves that first file for each patient. However, general care is required when combining order_by and distinct: https://docs.djangoproject.com/en/1.9/ref/models/querysets/#django.db.models.query.QuerySet.distinct
Edit: Removed single patient dependence from first filter and changed latest to combination of order_by and distinct
Consider p is a Patient class instance.
I think you can do someting like:
p.files.filter(issue_date__lt='some_date', expiring_date__gt='some_date')
See https://docs.djangoproject.com/en/1.9/topics/db/queries/#backwards-related-objects
Or maybe with the Q magic query object...

Django models: retrieving unique foreign key instances

I have two tables like so:
class Collection(models.Model):
name = models.CharField()
class Image(models.Model):
name = models.CharField()
image = models.ImageField()
collection = models.ForeignKey(Collection)
I'd like to retrieve the first image out of every collection. I have attempted:
image_list = Image.objects.order_by('collection.id').distinct('collection.id')
but it didn't work out the way I expected :(
Any ideas?
Thanks.
Don't use dots to separate fields that span relations in Django; the double-underscore convention is used instead -- it means "follow this relation to get to this field"
this is more correct:
image_list = Image.objects.order_by('collection__id').distinct('collection__id')
However, it probably doesn't do what you want.
The concept of "first" doesn't always apply in relational databases the way you seem to be using it. For all of the records in the image table with the same collection id, there is no record which is 'first' or 'last' -- they're all just records. You could put another field on that table to define a specific order, or you could order by id, or alphabetically by name, but none of those will happen by default.
What will probably work best for you is to get the list of collections with one query, and then get a single item per collection, in separate queries:
collection_ids = Image.objects.values_list('collection', flat=True).distinct()
image_list = [
Image.objects.filter(collection__id=c)[0] for c in collection_ids
]
If you want to apply an order to the Images, to define which is 'first', then modify it like this:
collection_ids = Image.objects.values_list('collection', flat=True).distinct()
image_list = [
Image.objects.filter(collection__id=c).order_by('-id')[0] for c in collection_ids
]
You could also write raw SQL -- MySQL aggregation has the interesting property that fields which are not aggregated over can still appear in the final output, and essentially take a random value from the set of matching records. Something like this might work:
Image.objects.raw("SELECT image.* FROM app_image GROUP BY collection_id")
This query should get you one image from each collection, but you will have no control over which one is returned.
As written in my comment, you cannot use specific fields with distinct under MySQL. However, you can achieve the same result with the following:
from itertools import groupby
all_images = Image.objects.order_by('collection__id')
images_by_collection = groupby(all_images, lambda image: image.collection_id)
image_list = sum([group for key, group in images_by_collection], [])
Unfortunately, this results in a "bigger" query to the DB (all images are retrieved).
dict([(c.id, c.image_set.all()[0]) for c in Collection.objects.all()])
That will create a dictionary of the first image (by default ordering) in each collection, keyed by the collection's id. Be aware, though, that this will generate 1+N queries, where N is the total number of collection objects.
To get around that, you'll either need to wait for Django 1.4 and prefetch_related or use something like django-batch-select.
First get the distinct result, then do your filters.
I think you should try this one.
image_list = Image.objects.distinct()
image_list = image_list.order_by('collection__id')

Limit django queryset by another related table

Lets say I have 2 django models like this:
class Spam(models.Model):
somefield = models.CharField()
class Eggs(models.Model):
parent_spam = models.ForeignKey(Spam)
child_spam = models.ForeignKey(Spam)
Given the input of a "Spam" object, how would the django query looks like that:
Limits this query based on the parent_spam field in the "Eggs" table
Gives me the corresponding child_spam field
And returns a set of "Spam" objects
In SQL:
SELECT * FROM Spam WHERE id IN (SELECT child_spam FROM Eggs WHERE parent_spam = 'input_id')
I know this is only an example, but this model setup doesn't actually validate as it is - you can't have two separate ForeignKeys pointing at the same model without specifying a related_name. So, assuming the related names are egg_parent and egg_child respectively, and your existing Spam object is called my_spam, this would do it:
my_spam.egg_parent.child_spam.all()
or
Spam.objects.filter(egg_child__parent_spam=my_spam)
Even better, define a ManyToManyField('self') on the Spam model, which handles all this for you, then you would do:
my_spam.other_spams.all()
According to your sql code you need something like this
Spam.objects.filter(id__in= \
Eggs.objects.values_list('child_spam').filter(parent_spam='input_id'))