Django GROUP BY including unnecessary columns?

Django GROUP BY including unnecessary columns? - django

I have Django code as follows
qs = Result.objects.only('time')
qs = qs.filter(organisation_id=1)
qs = qs.annotate(Count('id'))
And it gets translated into the following SQL:
SELECT "myapp_result"."id", "myapp_result"."time", COUNT("myapp_result"."id") AS "id__count" FROM "myapp_result" WHERE "myapp_result"."organisation_id" = 1 GROUP BY "myapp_result"."id", "myapp_result"."organisation_id", "myapp_result"."subject_id", "myapp_result"."device_id", "myapp_result"."time", "myapp_result"."tester_id", "myapp_result"."data"
As you can see, the GROUP BY clause starts with the field I intended (id) but then it goes on to list all the other fields as well. Is there any way I can persuade Django not to specify all the individual fields like this?
As you can see, even with .only('time') that doesn't stop Django from listing all the other fields anyway, but only in this GROUP BY clause.
The reason I want to do this is to avoid the issue described here where PostgreSQL doesn't support annotation when there's a JSON field involved. I don't want to drop native JSON support (so I'm not actually using django-jsonfield). The query works just fine if I manually issue it without the reference to "myapp_result"."data" (the only JSON field on the model). So if I could just persuade Django not to refer to it, I'd be fine!

only only defers the loading of certain fields, i.e. it allows for lazy loading of big or unused fields. It should generally not be used unless you know exactly what you're doing and why you need it, as it is nothing more than a performance booster than often decreases performance with improper use.
What you're looking for is values() (or values_list()), which actually excludes certain fields instead of just lazy loading. This will return a dictionary (or list) instead of a model instance, but this is the only way to tell Django to not take other fields into account:
qs = (Result.objects.filter_by(organisation_id=1)
.values('time').annotate(Count('id')))

Related

Problem with .only() method, passing to Pagination / Serialization --- all fields are getting returned instead of the ones specified in only()

I am trying load some data into datatables. I am trying to specify columns in the model.objects query by using .only() --- at first glance at the resulting QuerySet, it does in fact look like the mySQL query is only asking for those columns.
However, When I try to pass the QuerySet into Paginator, and/or a Serializer, the result has ALL columns in it.
I cannot use .values_list() because that does not return the nested objects that I need to have serialized as part of my specific column ask. I am not sure what is happening to my .only()
db_result_object = model.objects.prefetch_related().filter(qs).order_by(asc+sort_by).only(*columns_to_return)
paginated_results = Paginator(db_result_object,results_per_page)
serialized_results = serializer(paginated_results.object_list,many=True)
paginated_results.object_list = serialized_results.data
return paginated_results

This one has tripped me up too. In Django, calling only() doesn't return data equivalent to a SQL statement like this:
SELECT col_to_return_1, ... col_to_return_n
FROM appname_model
The reason it doesn't do it like this is because Django returns data to you not when you construct the QuerySet, but when you first access data from that QuerySet (see lazy QuerySets).
In the case of only() (a specific example of what is called a deferred field) you still get all of the fields like you normally would, but the difference is that it isn't completely loaded in from the database immediately. When you access the data, it will only load the fields included in the only statement. Some useful docs here.
My recommendation would be to write your Serializer so that it is only taking care of the one specific filed, likely using a SerializerMethodField with another serializer to serialize your related fields.

Django: Add arbitrary additional data to a queryset

I am trying to display a map of my data based on a search. The easiest way to handle the map display would be to serialized the queryset generated by the search, and indeed this works just fine using . However, I'd really like to allow for multiple searches, with the displayed points being shown in a user chosen color. The user chosen color, obviously cannot come from the database, since it is not a property of these objects, so none of the aggregators make sense here.
I have tried simply making a utility class, since what I really need is a somewhat complex join between two model classes that then gets serialized into geojson. However, once I created that utility class, it became evident that I lost a lot of the benefits of having a queryset, especially the ability to easily serialize the data with django-geojson (or natively once I can get 1.8 to run smoothly).
Basically, I want to be able to do something like:
querySet = datumClass.objects.filter(...user submitted search parameters...).annotate(color='blue')
Is this possible at all? It seems like this would be more elegant and would work better than my current solution of a non-model utility class which has some serious serialization issues when I try to use python-geojson to serialize.

The problem is that extra comes with all sorts of warning about usefulness or deprecation... But this works:
.extra(select={'color': "'blue'"})
Notice the double quotes wrapping the string value.
This translates to:
SELECT ('blue') AS "color"

Not quite sure what you are trying to achieve, but you can add extra attributes to your objects iterating over the queryset in the view. These can be accessed from the template.
for object in queryset :
if object.contition = 'a'
object.color = 'blue'
else:
object.color = 'green'

if you have a dictionary that maps fields to values, you can do things like
filter_dictionary = {
'date__lte' : '2014-03-01'
}
qs = DatumClass.objects.filter(**filter_dictionary)
And qs would have all dates less than that date (if it has a date field). So, as a user, I could submit any key, value pairs that you could place in your dictionary.

Generating a single queryset with filtered summary data across a foreign key?

I have a small Django project to learn with (it's a web UI for the RANCID backup software) and I've run into a problem.
The model for the app defines Devices, and DeviceGroups. Each Device is a member of a group and has a couple of state flags - Enabled, Successful - to indicate if they are operating correctly. Here's the relevant bits.
class DeviceGroup(models.Model):
group_name = models.CharField(max_length=60,unique=True)
class Device(models.Model):
hostname = models.CharField(max_length=60,unique=True)
enabled = models.BooleanField(default=True)
device_group = models.ForeignKey(DeviceGroup)
last_was_success = models.BooleanField(default=False,editable=False)
I have a summary table on the front 'dashboard' page, that shows a list of all the groups, and for each group, how many devices are in it. I'd like to also show the number of Active devices, and the number of failing (i.e. Not last_was_success) devices per-group. The plain device count is already available through the ForeignKey field.
This seems like the kind of thing that annotate is for, but not quite. And actually, I'm not sure how I'd do it with raw SQL either. Most likely as three queries and some lookup afterwards, or subqueries.
So - is it possible 'nicely' in Django? Or alternatively, how do you do the joining up again in the Template or View? The object passed into the template is simply:
device_groups = DeviceGroup.objects.order_by('group_name')
currently, and I don't think I can just add extra fields onto the queryset results "manually", can I? i.e. it's not a dict or similar.

i think you must use
device_groups = DeviceGroup.objects.all().order_by('group_name')
or
device_groups = DeviceGroup.objects.filter(condition).order_by('group_name')

Django views - optimum query set in a ForeignKey model

Having the model:
class Notebook(models.Model):
n_id = models.AutoField(primary_key = True)
class Note(models.Model):
b_nbook = models.ForeignKey(Notebook)
the URL pattern passing one parameter:
(r'^(?P<n_id>\d+)/$', 'notebook_notes')
and the following view:
def notebook_notes(request, n_id):
nbook = get_object_or_404(Nbook, pk=n_id)
...
which of the following is the optimum query set, and why? (they both work and pass the notes based to a selected by URL notebook)
notes = nbook.note_set.filter(b_nbook = n_id)
notes = Note.objects.select_related().filter(b_nbook = n_id)

Well you're comparing apples and oranges a bit there. They may return virtually the same, but you're doing different things on both.
Let's take the relational version first. That query is saying get all the notes that belong to nbook. You're then filtering that queryset by only notes that belong to nbook. You're filtering it twice on the same criteria, in effect. Since Django's querysets are lazy, it doesn't really do anything bad, like hit the database multiple times, but it's still unnecessary.
Now, the second version. Here, you're starting with all notes and filtering to just those that belong to the particular notebook. There's only one filter this time, but it's bad form to do it this way. Since it's a relation, you should look it up through the relational format, i.e. nbook.note_set.all(). On this version, though, you're also using select_related(), which wasn't used on the other version.
select_related will attempt to create a join table with any other relations on the model, in this case a Note. However, since the only relation on Note is Notebook and you already have the notebook, it's redundant.
Taking out all the redundancy in those two version leaves you with just:
notes = nbook.note_set.all()
That, too, will return exactly the same results as the other two version, but is much cleaner and standardized.

django's .extra(where= clauses are clobbered by table-renaming .filter(foo__in=... subselects

The short of it is, the table names of all queries that are inside a filter get renamed to u0, u1, ..., so my extra where clauses won't know what table to point to. I would love to not have to hand-make all the queries for every way I might subselect on this data, and my current workaround is to turn my extra'd queries into pk values_lists, but those are really slow and something of an abomination.
Here's what this all looks like. You can mostly ignore the details of what goes in the extra of this manager method, except the first sql line which points to products_product.id:
def by_status(self, *statii):
return self.extra(where=["""products_product.id IN
(SELECT recent.product_id
FROM (
SELECT product_id, MAX(start_date) AS latest
FROM products_productstatus
GROUP BY product_id
) AS recent
JOIN products_productstatus AS ps ON ps.product_id = recent.product_id
WHERE ps.start_date = recent.latest
AND ps.status IN (%s))""" % (', '.join([str(stat) for stat in statii]),)])
Which works wonderfully for all the situations involving only the products_product table.
When I want these products as a subselect, i do:
Piece.objects.filter(
product__in=Product.objects.filter(
pk__in=list(
Product.objects.by_status(FEATURED).values_list('id', flat=True))))
How can I keep the generalized abilities of a query set, yet still use an extra where clause?

At first: the issue is not totally clear to me. Is the second code block in your question the actual code you want to execute? If this is the case the query should work as expected since there is no subselect performed.
I assume so that you want to use the second code block without the list() around the subselect to prevent a second query being performed.
The django documentation refers to this issue in the documentation about the extra method. However its not very easy to overcome this issue.
The easiest but most "hakish" solution is to observe which table alias is produced by django for the table you want to query in the extra method. You can rely on the persistent naming of this alias as long as you construct the query always in the same fashion (you don't change the order of multiple extra methods or filter calls that cause a join).
You can inspect a query that will be execute in the DB queryset by using:
print Model.objects.filter(...).query
This will reveal the aliases that are used for the tables you want to query.

As of Django 1.11, you should be able to use Subquery and OuterRef to generate an equivalent query to your extra (using a correlated subquery rather than a join):
def by_status(self, *statii):
return self.filter(
id__in=Subquery(ProductStatus.values("product_id").filter(
status__in=statii,
product__in=Subquery(ProductStatus.objects.values(
"product_id",
).annotate(
latest=Max("start_date"),
).filter(
latest=OuterRef("start_date"),
).values("product_id"),
),
)
You could probably do it with Window expressions as well (as of Django 2.0).
Note that this is untested, so may need some tweaks.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django GROUP BY including unnecessary columns? - django

Related

Problem with .only() method, passing to Pagination / Serialization --- all fields are getting returned instead of the ones specified in only()

Django: Add arbitrary additional data to a queryset

Generating a single queryset with filtered summary data across a foreign key?

Django views - optimum query set in a ForeignKey model

django's .extra(where= clauses are clobbered by table-renaming .filter(foo__in=... subselects

Categories

Resources