I have a Django model that defines a TimeSlot. Each TimeSlot can hold a certain number of users (TimeSlot.spots). Each TimeSlot also has a certain number of users already held in it (a many to many field, TimeSlot.participants.
When I pass to the template that displays the available TimeSlots to the user, I annotate with TimeSlot.objects.annotate(Count('participants')),which gives the number of users currently held by the TimeSlot as participants__count.
However, what I really want is the number of spots remaining, the capacity (TimeSlot.spots) minus the number currently held (participants__count). How can I annotate another field with this new number, so I can pass it to the template?
It's still not possible with annotation (though it is planned to implement in Django). But you can do it with an .extra() query. See my answer to another question for details.
Upd.:
Essentially, you need somethig like this query:
items = MyModel.objects.extra(
select = {'variance': 'Model.someField - SUM(relatedModel__someField)'},
)
Not possible with only an annotation. I'd create a method on the model which does the annotation, and then subtract that from your TimeSlot.spots value. This will use more database queries, but thats your only option. Or I guess you could drop down to raw SQL...
Related
Using Django we have situation where we have a model Case which can be set to being a medical case or not (through a BooleanField).
Now, we also have a system to check if a certain employee (User subclass) is authorized to see sensitive data when a case is labeled as being medical (containing medical sensitive data).
I am able to annotate a new field to each instance, a BooleanField letting us know if the requesting employee is authorized to see medical data on the specific Case instance or not.
Ideally, I would like to have the database sensor out specific fields (field customer for example), when the requesting employee is not authorized to see medical data for that case. I imagine this can be done with an annotate method, and a combination of from django.db.models.Case and from django.db.models.When.
But, what we would also like is that the resulting QuerySet keeps the same field names on the different model instances. We don't want to change the field name of customer to another name.
We have actually come up with a solution, using .values first, and then the .annotate for each field we want to potentially censor out (see code below). This isn't ideal though, for multiple reasons. For one, we don't get back model instances, but dictionaries. Also, but this is another question, one of the fields that needs to be censored is a ManyToManyField, and using .values now returns a unique row for each instance referred to through the ManyToManyField (any solution for that?)
Also, ideally, this queryset would be the base queryset for all situations in which an employee tries to request Cases in our app. We want all our colleagues to use this base queryset so that we don't have to implement the same solution in multiple places, and prevent sensitive data from leaking.
So, I am wondering, can anyone recommend a solution for this situation?
Thanks in advance!
PS. We would like to have this done by the database since the amount of cases being fetched is potentially very high, and doing this in Python would probably require a lot of CPU power and thus kill performance.
from django.db.models import Case, When, BooleanField, IntegerField, F, Value, Q
OurModel.objects.annotate(
employee_medical_authorized=Case(
When(..., then=Value(True)),
default=Value(False),
output_field=BooleanField()
)).values(...).annotate(
customer=Case(
When(Q(employee_medical_authorized=Value(False)) & Q(medical=Value(True)),
then=Value(None)),
default=F('customer'),
output_field=IntegerField()
)
)
I have a model class in my django project:
*user_id
*amount
*net_balance
*created_on
I have a list of user_ids(let's say 3). I need to get the last row for each user_id and then do some operation and create a new row for each user id. How do this efficiently. I can certainly do 6 transactions (if there are 3 items in list of userids).
If you want the most recent entry then
YourModel.objects.filter(user=user_id).latest('created_on')
If I understand your question correctly then you need to get all the user_ids (presumably you have a separate User model?) and then loop through them - for each user getting the most recent entry and then create the new row.
You need 1 select (at least) for all the records you interested and 1 insert query for each record returned.
The select query can be generated by ORM abilities (aggregation) or you can use raw SQL if you fill comfortable. If you use PostgreSQL, you can use distinct ability (I recommended) as:
Model.objects.order_by('user_id', '-created_on').distinct('user_id')
or you can use aggregation abilities as:
Model.objects.filter(user_id__in=[1,2,3]).values('user_id', 'created_on').annotate(last_row=Max('created_on')).filter(created_on=F('last_row'))
The correct answer depends on your Django version and database. But there are lots of good features in Django to achieve this kind of stuffs.
I am currently using taggableManager as my tags in django. I can see in admin what is currently using the tag but then is there a way to count them?
let's say I have the follow tag and as can see there are a 4 objects using this tag. how can I get the count of 4 for this tag?
Thanks in advance
You will want to do a typical query on your database for the count of rows for a particular tag. Instead of looking at the len of a queryset there is another count feature less commonly known in Django which gives you the count number from SQL as opposed to having to query the entire database just to get the length.
len(). A QuerySet is evaluated when you call len() on it. This, as you
might expect, returns the length of the result list.
Note: If you only need to determine the number of records in the set
(and don’t need the actual objects), it’s much more efficient to
handle a count at the database level using SQL’s SELECT COUNT(*).
Django provides a count() method for precisely this reason.
https://docs.djangoproject.com/en/1.11/ref/models/querysets/
For a model in my database I need to store around 300 values for a specific field. What would be the drawbacks, in terms of performance and simplicity in query, if I use Postgres-specific ArrayField instead of a separate table with One-to-Many relationship?
If you use an array field
The size of each row in your DB is going to be a bit large thus Postgres is going to be using a lot more toast tables (http://www.postgresql.org/docs/9.5/static/storage-toast.html)
Every time you get the row, unless you specifically use defer (https://docs.djangoproject.com/en/1.9/ref/models/querysets/#defer) the field or otherwise exclude it from the query via only, or values or something, you paying the cost of loading all those values every time you iterate across that row. If that's what you need then so be it.
Filtering based on values in that array, while possible isn't going to be as nice and the Django ORM doesn't make it as obvious as it does for M2M tables.
If you use M2M
You can filter more easily on those related values
Those fields are postponed by default, you can use prefetch_related if you need them and then get fancy if you want only a subset of those values loaded
Total storage in the DB is going to be slightly higher with M2M because of keys, and extra id fields
The cost of the joins in this case is completely negligible because of keys.
Personally I'd say go with the M2M tables, but I don't know your specific application. If you're going to be working with a massive amount of data it's likely worth grabbing a representative dataset and testing both methods with it.
That seems simple enough, but all Django Queries seems to be 'SELECT *'
How do I build a query returning only a subset of fields ?
In Django 1.1 onwards, you can use defer('col1', 'col2') to exclude columns from the query, or only('col1', 'col2') to only get a specific set of columns. See the documentation.
values does something slightly different - it only gets the columns you specify, but it returns a list of dictionaries rather than a set of model instances.
Append a .values("column1", "column2", ...) to your query
The accepted answer advising defer and only which the docs discourage in most cases.
only use defer() when you cannot, at queryset load time, determine if you will need the extra fields or not. If you are frequently loading and using a particular subset of your data, the best choice you can make is to normalize your models and put the non-loaded data into a separate model (and database table). If the columns must stay in the one table for some reason, create a model with Meta.managed = False (see the managed attribute documentation) containing just the fields you normally need to load and use that where you might otherwise call defer(). This makes your code more explicit to the reader, is slightly faster and consumes a little less memory in the Python process.