Django aggregate multiple columns after arithmetic operation

Django aggregate multiple columns after arithmetic operation - django

I have a really strange problem with Django 1.4.4.
I have this model :
class LogQuarter(models.Model):
timestamp = models.DateTimeField()
domain = models.CharField(max_length=253)
attempts = models.IntegerField()
success = models.IntegerField()
queue = models.IntegerField()
...
I need to gather the first 20 domains with the higher sent property. The sent property is attempts - queue.
This is my request:
obj = LogQuarter.objects\
.aggregate(Sum(F('attempts')-F('queue')))\
.values('domain')\
.filter(**kwargs)\
.order_by('-sent')[:20]
I tried with extra too and it isn't working.
It's really basic SQL, I am surprised that Django can't do this.
Did someone has a solution ?

You can actually do this via subclassing some of the aggregation functionality. This requires digging in to the code to really understand, but here's what I coded up to do something similar for MAX and MIN. (Note: this code is based of Django 1.4 / MySQL).
Start by subclassing the underlying aggregation class and overriding the as_sql method. This method writes the actual SQL to the database query. We have to make sure to quote the field that gets passed in correctly and associate it with the proper table name.
from django.db.models.sql import aggregates
class SqlCalculatedSum(aggregates.Aggregate):
sql_function = 'SUM'
sql_template = '%(function)s(%(field)s - %(other_field)s)'
def as_sql(self, qn, connection):
# self.col is currently a tuple, where the first item is the table name and
# the second item is the primary column name. Assuming our calculation is
# on two fields in the same table, we can use that to our advantage. qn is
# underlying DB quoting object and quotes things appropriately. The column
# entry in the self.extra var is the actual database column name for the
# secondary column.
self.extra['other_field'] = '.'.join(
[qn(c) for c in (self.col[0], self.extra['column'])])
return super(SqlCalculatedSum, self).as_sql(qn, connection)
Next, subclass the general model aggregation class and override the add_to_query method. This method is what determines how the aggregate gets added to the underlying query object. We want to be able to pass in the field name (e.g. queue) but get the corresponding DB column name (in case it is something different).
from django.db import models
class CalculatedSum(models.Aggregate):
name = SqlCalculatedSum
def add_to_query(self, query, alias, col, source, is_summary):
# Utilize the fact that self.extra is set to all of the extra kwargs passed
# in on initialization. We want to get the corresponding database column
# name for whatever field we pass in to the "variable" kwarg.
self.extra['column'] = query.model._meta.get_field(
self.extra['variable']).db_column
query.aggregates[alias] = self.name(
col, source=source, is_summary=is_summary, **self.extra)
You can then use your new class in an annotation like this:
queryset.annotate(calc_attempts=CalculatedSum('attempts', variable='queue'))
Assuming your attempts and queue fields have those same db column names, this should generate SQL similar to the following:
SELECT SUM(`LogQuarter`.`attempts` - `LogQuarter`.`queue`) AS calc_attempts
And there you go.

I am not sure if you can do this Sum(F('attempts')-F('queue')). It should throw an error in the first place. I guess, easier approach would be to use extra.
result = LogQuarter.objects.extra(select={'sent':'(attempts-queue)'}, order_by=['-sent'])[:20]

Related

Django ORM: Get all records and the respective last log for each record

I have two models, the simple version would be this:
class Users:
name = models.CharField()
birthdate = models.CharField()
# other fields that play no role in calculations or filters, but I simply need to display
class UserLogs:
user_id = models.ForeignKey(to='Users', related_name='user_daily_logs', on_delete=models.CASCADE)
reference_date = models.DateField()
hours_spent_in_chats = models.DecimalField()
hours_spent_in_p_channels = models.DecimalField()
hours_spent_in_challenges = models.DecimalField()
# other fields that play no role in calculations or filters, but I simply need to display
What I need to write is a query that will return all the fields of all users, with the latest log (reference_date) for each user. So for n users and m logs, the query should return n records. It is guaranteed that each user has at least one log record.
Restrictions:
the query needs to be written in django orm
the query needs to start from the user model. So Anything that goes like Users.objects... is ok. Anything that goes like UserLogs.objects... is not. That's because of filters and logic in the viewset, which is beyond my control
It has to be a single query, and no iterations in python, pandas or itertools are allowed. The Queryset will be directly processed by a serializer.
I shouldn't have to specify the names of the columns that need to be returned, one by one. The query must return all the columns from both models
Attempt no. 1 returns only user id and the log date (for obvious reasons). However, it is the right date, but I just need to get the other columns:
test = User.objects.select_related("user_daily_logs").values("user_daily_logs__user_id").annotate(
max_date=Max("user_daily_logs__reference_date"))
Attempt no. 2 generates as error (Cannot resolve expression type, unknown output_field):
logs = UserLogs.objects.filter(user_id=OuterRef('pk')).order_by('-reference_date')[:1]
users = Users.objects.annotate(latest_log = Subquery(logs))

This seems impossible taking into account all the restrictions.
One approach would be to use prefetch_related
users = User.objects.all().prefetch_related(
models.Prefetch(
'user_daily_logs',
queryset=UserLogs.objects.filter().order_by('-reference_date'),
to_attr="latest_log"
)
)
This will do two db queries and return all logs for every user which may or not be a problem depending on the number of records. If you need only logs for the current day as the name suggest, you can add that to filter and reduce the number of UserLogs records. Of course you need to get the first element from the list.
users.daily_logs[0]
For that you can create a #property on the User model which could look roughly like this
#property
def latest_log(self):
if not hasattr('daily_logs'):
return None
return self.daily_logs[0]
user.latest_log
You can also go a step further and try the following SubQuery inside Prefetch to limit the queryset to one element but I am not sure on the performance with this one (credits Django prefetch_related with limit).
users = User.objects.all().prefetch_related(
models.Prefetch(
'user_daily_logs',
queryset=UserLogs.objects.filter(id__in=Subquery(UserLogs.objects.filter(user_id=OuterRef('user_id')).order_by('-reference_date').values_list('id', flat=True)[:1] ) ),
to_attr="latest_log"
)
)

django DateField model -- unable to find the date differences

I'm unable to find the difference between two dates in my form.
models.py:
class Testing(models.Model):
Planned_Start_Date = models.DateField()
Planned_End_Date = models.DateField()
Planned_Duration = models.IntegerField(default=Planned_Start_Date - Planned_End_Date)
difference between the date has to calculated and it should stored in the database but It doesn't works

default is a callable function that is just used on the class level, so you can't use it to do what you want. You should override the model's save() method (or better, implement a pre_save signal handler to populate the field just before the object is saved:
def save(self, **kwargs):
self.Planned_Duration = self.Planned_End_Date - self.Planned_Start_Date
super().save(**kwargs)
But why do you save a computed property to the database? This column is unnecessary. Both for querying (you can easily use computed queries on the start and end date) as for retrieving, you're wasting db space.
# if you need the duration just define a property
#property
def planned_duration(self):
return self.Planned_End_Date - self.Planned_Start_Date
# if you need to query tasks which last more than 2 days
Testing.objects.filter(Planned_End_Date__gt=F('Planned_Start_Date') + datetime.timedelta(days=2))
Note: Python conventions would recommend you name your fields using snake_case (planned_duration, planned_end_date, planned_start_date). Use CamelCase for classes (TestingTask). Don't mix the two.

Join annotations in Django without raw SQL

I have a model that has arbitrary key/value pairs (attributes) associated with it. I'd like to have the option of sorting by those dynamic attributes. Here's what I came up with:
class Item(models.Model):
pass
class Attribute(models.Model):
item = models.ForeignKey(Item, related_name='attributes')
key = models.CharField()
value = models.CharField()
def get_sorted_items():
return Item.objects.all().annotate(
first=models.select_attribute('first'),
second=models.select_attribute('second'),
).order_by('first', 'second')
def select_attribute(attribute):
return expressions.RawSQL("""
select app_attribute.value from app_attribute
where app_attribute.item_id = app_item.id
and app_attribute.key = %s""", (attribute,))
This works, but it has a bit of raw SQL in it, so it makes my co-workers wary. Is it possible to do this without raw SQL? Can I make use of Django's ORM to simplify this?
I would expect something like this to work, but it doesn't:
def get_sorted_items():
return Item.objects.all().annotate(
first=Attribute.objects.filter(key='first').values('value'),
second=Attribute.objects.filter(key='second').values('value'),
).order_by('first', 'second')

Approach 1
Using Djagno 1.8+ Conditional Expressions
(see also Query Expressions)
items = Item.objects.all().annotate(
first=models.Case(models.When(attribute__key='first', then=models.F('attribute__value')), default=models.Value('')),
second=models.Case(models.When(attribute__key='second', then=models.F('attribute__value')), default=models.Value(''))
).distinct()
for item in items:
print item.first, item.second
Approach 2
Using prefetch_related with custom models.Prefetch object
keys = ['first', 'second']
items = Item.objects.all().prefetch_related(
models.Prefetch('attributes',
queryset=Attribute.objects.filter(key__in=keys),
to_attr='prefetched_attrs'),
)
This way every item from the queryset will contain a list under the .prefetched_attrs attribute.
This list will contains all filtered-item-related attributes.
Now, because you want to get the attribute.value, you can implement something like this:
class Item(models.Model):
#...
def get_attribute(self, key, default=None):
try:
return next((attr.value for attr in self.prefetched_attrs if attr.key == key), default)
except AttributeError:
raise AttributeError('You didnt prefetch any attributes')
#and the usage will be:
for item in items:
print item.get_attribute('first'), item.get_attribute('second')
Some notes about the differences in using both approaches.
you have a one idea better control over the filtering process using the approach with the custom Prefetch object. The conditional-expressions approach is one idea harder to be optimized IMHO.
with prefetch_related you get the whole attribute object, not just the value you are interested in.
Django executes prefetch_related after the queryset is being evaluated, which means a second query is being executed for each clause in the prefetch_related call. On one way this can be good, because it this keeps the main queryset untouched from the filters and thus not additional clauses like .distinct() are needed.
prefetch_related always put the returned objects into a list, its not very convenient to use when you have prefetchs returning 1 element per object. So additional model methods are required in order to use with pleasure.

Django custom model Manager database error

I'm trying to build a custom model manager, but have run into an error. The code looks like this:
class LookupManager(models.Manager):
def get_options(self, *args, **kwargs):
return [(t.key, t.value) \
for t in Lookup.objects.filter(group=args[0].upper())]
class Lookup(models.Model):
group = models.CharField(max_length=1)
key = models.CharField(max_length=1)
value = models.CharField(max_length=128)
objects = LookupManager()
(I have played around with get_options quite a lot using super() and other ways to filter the results)
When I run syncdb, I get the following error (ops_lookup being the corresponding table):
django.db.utils.DatabaseError: no such table: ops_lookup
I noticed that if I change the manager to return [] instead of a filter, then syncdb works. Also, if I've run syncdb and all the tables exist, then change the code to the above, it works as well.
How can I get Django to not expect this table to exist when running syncdb for the first time?
Update
After looking through the traceback I realised what was happening. The lookup table is meant to contain values which populate the choices of some columns in other tables. I think what happens is that the manager gets called when the other tables are created which, it seems, happens before the lookup table is created.
Is there any way to force django to create the lookup table first (short of renaming it?)

What's happening is that you're trying to access the database during module load time. For example:
class MyModel(models.Model):
name = models.CharField(max_length=255)
class OtherModel(models.Model):
some_field = models.CharField(
max_length=255,
# Next line fails on syncdb because the database table hasn't been created yet
# but the model is being queried during module load time (during class definition)
choices=[(o.pk, o.name) for o in MyModel.objects.all()]
)
This is equivalent to what you're doing because, as you've stated, you're using the manager method (transitively) to generate choices for other models.
Replacing the list comprehension with a generator expression will return an iterable, but will not evaluate the filtered queryset until the first iteration. So, this would fix the above example:
choices=((o.pk, o.name) for o in MyModel.objects.all())
Using your example, it would be:
class LookupManager(models.Manager):
def get_options(self, *args, **kwargs):
return ((t.key, t.value) for t in Lookup.objects.filter(group=args[0].upper()))
(note the use of ( and ) instead of [ and ]) (the outer ones) - that is the syntax for creating a generator expression.

How can i get a list of objects from a postgresql view table to display

this is a model of the view table.
class QryDescChar(models.Model):
iid_id = models.IntegerField()
cid_id = models.IntegerField()
cs = models.CharField(max_length=10)
cid = models.IntegerField()
charname = models.CharField(max_length=50)
class Meta:
db_table = u'qry_desc_char'
this is the SQL i use to create the table
CREATE VIEW qry_desc_char as
SELECT
tbl_desc.iid_id,
tbl_desc.cid_id,
tbl_desc.cs,
tbl_char.cid,
tbl_char.charname
FROM tbl_desC,tbl_char
WHERE tbl_desc.cid_id = tbl_char.cid;
i dont know if i need a function in models or views or both. i want to get a list of objects from that database to display it. This might be easy but im new at Django and python so i having some problems

Django 1.1 brought in a new feature that you might find useful. You should be able to do something like:
class QryDescChar(models.Model):
iid_id = models.IntegerField()
cid_id = models.IntegerField()
cs = models.CharField(max_length=10)
cid = models.IntegerField()
charname = models.CharField(max_length=50)
class Meta:
db_table = u'qry_desc_char'
managed = False
The documentation for the managed Meta class option is here. A relevant quote:
If False, no database table creation
or deletion operations will be
performed for this model. This is
useful if the model represents an
existing table or a database view that
has been created by some other means.
This is the only difference when
managed is False. All other aspects of
model handling are exactly the same as
normal.
Once that is done, you should be able to use your model normally. To get a list of objects you'd do something like:
qry_desc_char_list = QryDescChar.objects.all()
To actually get the list into your template you might want to look at generic views, specifically the object_list view.

If your RDBMS lets you create writable views and the view you create has the exact structure than the table Django would create I guess that should work directly.

(This is an old question, but is an area that still trips people up and is still highly relevant to anyone using Django with a pre-existing, normalized schema.)
In your SELECT statement you will need to add a numeric "id" because Django expects one, even on an unmanaged model. You can use the row_number() window function to accomplish this if there isn't a guaranteed unique integer value on the row somewhere (and with views this is often the case).
In this case I'm using an ORDER BY clause with the window function, but you can do anything that's valid, and while you're at it you may as well use a clause that's useful to you in some way. Just make sure you do not try to use Django ORM dot references to relations because they look for the "id" column by default, and yours are fake.
Additionally I would consider renaming my output columns to something more meaningful if you're going to use it within an object. With those changes in place the query would look more like (of course, substitute your own terms for the "AS" clauses):
CREATE VIEW qry_desc_char as
SELECT
row_number() OVER (ORDER BY tbl_char.cid) AS id,
tbl_desc.iid_id AS iid_id,
tbl_desc.cid_id AS cid_id,
tbl_desc.cs AS a_better_name,
tbl_char.cid AS something_descriptive,
tbl_char.charname AS name
FROM tbl_desc,tbl_char
WHERE tbl_desc.cid_id = tbl_char.cid;
Once that is done, in Django your model could look like this:
class QryDescChar(models.Model):
iid_id = models.ForeignKey('WhateverIidIs', related_name='+',
db_column='iid_id', on_delete=models.DO_NOTHING)
cid_id = models.ForeignKey('WhateverCidIs', related_name='+',
db_column='cid_id', on_delete=models.DO_NOTHING)
a_better_name = models.CharField(max_length=10)
something_descriptive = models.IntegerField()
name = models.CharField(max_length=50)
class Meta:
managed = False
db_table = 'qry_desc_char'
You don't need the "_id" part on the end of the id column names, because you can declare the column name on the Django model with something more descriptive using the "db_column" argument as I did above (but here I only it to prevent Django from adding another "_id" to the end of cid_id and iid_id -- which added zero semantic value to your code). Also, note the "on_delete" argument. Django does its own thing when it comes to cascading deletes, and on an interesting data model you don't want this -- and when it comes to views you'll just get an error and an aborted transaction. Prior to Django 1.5 you have to patch it to make DO_NOTHING actually mean "do nothing" -- otherwise it will still try to (needlessly) query and collect all related objects before going through its delete cycle, and the query will fail, halting the entire operation.
Incidentally, I wrote an in-depth explanation of how to do this just the other day.

You are trying to fetch records from a view. This is not correct as a view does not map to a model, a table maps to a model.
You should use Django ORM to fetch QryDescChar objects. Please note that Django ORM will fetch them directly from the table. You can consult Django docs for extra() and select_related() methods which will allow you to fetch related data (data you want to get from the other table) in different ways.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django aggregate multiple columns after arithmetic operation - django

I am not sure if you can do this Sum(F('attempts')-F('queue')). It should throw an error in the first place. I guess, easier approach would be to use extra. result = LogQuarter.objects.extra(select={'sent':'(attempts-queue)'}, order_by=['-sent'])[:20]

Related

Django ORM: Get all records and the respective last log for each record

django DateField model -- unable to find the date differences

Join annotations in Django without raw SQL

Django custom model Manager database error

How can i get a list of objects from a postgresql view table to display

Categories

Resources