Quicksight problem with custom aggregations and calculated fields - amazon-athena

Here is my Drop-off Rate function
coalesce(1 - ({cnt}/{Lag}),1)
And the calculated field {Lag} is
lag(sum(cnt),[step ASC],1)
But I'm getting this error:
Custom aggregations can't contain both aggregated and nonaggregated fields.
Actually {Lag} no needs to be a sum of {cnt} bu the function requires it to be an aggregate.
Is there any workaround to achieve that drop-off calculation?

You need to aggregate the cnt in the numerator as well.
coalesce(1 - (sum({cnt})/{Lag}),1)

Related

Django Query Annotation Does Not Support math.tan() Calculation

I have a django 3.2/mysql website. One of the tables, Flights, has two columns, angle and baseline. These two columns are combined in a view to display the value altitude using the formula altitude = baseline * tan(angle). I am building a search form for this table, and the user wants to search for flights using a minimum altitude and a maximum altitude.
My normal approach to the problem is to gather the inputs from the search form, create appropriate Q statements, and use query the table with filters based on the Q statements. However, since there is no altitude field, I have to annotate the query with the altitude value for the altitude Q statement to work.
It seems that a query annotation cannot use an expression like math.tan(field_name). For example,
Flights.objects.annotate(altitude_m=ExpressionWrapper(expression=(F('baseline') * math.tan(F("angle"))), output_field = models.FloatField(),),)
generates the error builtins.TypeError: must be real number, not F
whereas the expression
Flights.objects.annotate(altitude_m=ExpressionWrapper(expression=(F('baseline') * F("angle")), output_field = models.FloatField(),),)
does not generate an error, but also does not provide the correct value.
Is there a way to annotate a query on the Flights table with the altitude value so I can add a filter like Q(altitude__gte= min_altitude) to the query? Or, is the best approach to (1) go back to the existing ~1,000 rows in the Flights table and add a column altitude and insert the calculated value, or (2) filter the Flights query on the other search criteria and in python calculate the altitude for each Flights object in the filtered queryset and discard those that don't meet the altitude search criteria?
Note: I have simplified the problem description somewhat by only describing a single altitude value. There are actually 2 altitude values, one in meters and one in feet.
Thanks!
Use the Tan database function
from django.db.models.functions import Tan
Flights.objects.annotate(
altitude=ExpressionWrapper(expression=F('baseline') * Tan('angle'), output_field=models.FloatField())
)

Django query to fetch top performers for each month

I need to fetch the top performer for each month, here is the below MySql query which gives me the correct output.
select id,Name,totalPoints, createdDateTime
from userdetail
where app=4 and totalPoints in ( select
max(totalPoints)
FROM userdetail
where app=4
group by month(createdDateTime), year(createdDateTime))
order by totalPoints desc
I am new to Django ORM. I am not able to write an equivalent Django query which does the task. I have been struggling with this logic for 2 days. Any help would be highly appreciated.
While the GROUP BY clause in a subquery is slightly difficult to express with the ORM because aggregate() operations don't emit querysets, a similar effect can be achieved with a Window function:
UserDetail.objects.filter(total_points__in=UserDetail.objects.annotate(max_points=Window(
expression=Max('total_points'),
partition_by=[Trunc('created_datetime', 'month')]
)).values('max_points')
)
In general, this sort of pattern is implemented with Subquery expressions. In this case, I've implicitly used a subquery by passing a queryset to an __in predicate.
The Django documentation's notes on using aggregates within subqueries is are also relevant to this sort of query, since you want to use the results of an aggregate in a subquery (which I've avoided by using a window function).
However, I believe your query may not correctly capture what you want to do: as written it could return rows for users who weren't the best in a given month but did have the same score as another user who was the best in any month.

Django - How to query the most recent distinct rows before an effective date?

I have a model full of effective dated rates. The notable fields on these rates are name, type, effective_date, and rate.
I need to be able to filter the modifier rates to get only rates of a certain type and before a certain date.
ModifierRate.objects.filter(
type=settings.MODIFIER_RATE_TYPES_TAX,
effective_date__lte=date)
That query may return rates with the same name and type so I need them to be distinct on those two fields.
.distinct('name', 'type')`
However, if there is a duplicate name and type I need the most recent one.
.order_by('-effective_date')
After all that I need to aggregate Sum the rates on those objects.
.aggregate(rate__sum=Coalesce(Sum('rate'), 0))['rate_sum']
If I try and smash all these things together I get
raise NotImplementedError("aggregate() + distinct(fields) not implemented.")
NotImplementedError: aggregate() + distinct(fields) not implemented.
I've been googling for awhile and there are many similar questions that make use of values_list and annotate but I don't think that's what I want.
How do I get the sum of the rates before a certain date that are distinct on fields name and type where the most recent distinct rate is used?
Thanks.
You could use django Subquery expressions, read the link for details.
most_recent = ModifierRate.objects.filter(
name=OuterRef('name'),
).order_by(
'-effective_date'
)
result = ModifierRate.objects.filter(
type=settings.MODIFIER_RATE_TYPES_TAX,
effective_date__lte=date
pk=Subquery(most_recent.values('pk')[:1])
).aggragate(
rate__sum=Coalesce(Sum('rate'), 0)
)['rate_sum']

django aggregate for multiple days

I have a model which has two attributes: date and length and others which are not relevant. And I need to display list of sums of length for each day in template.
The solution I've used so far is looping day by day and creating list of sums using aggregations like:
for day in month:
sums.append(MyModel.objects.filter(date=date).aggregate(Sum('length')))
But it seems very ineffective to me because of the number of db lookups. Isn't there a better way to do this? Like caching everything and then filter it without touching the db?
.values() can be used to group by date, so you will only get unique dates together with the sum of length fields via .annotate():
>>> from django.db.models import Sum
>>> MyModel.objects.values('date').annotate(total_length=Sum('length'))
From docs:
When .values() clause is used to constrain the columns that are returned in the result set, the method for evaluating annotations is slightly different. Instead of returning an annotated result for each result in the original QuerySet, the original results are grouped according to the unique combinations of the fields specified in the .values() clause.
Hope this helps.

Custom ordering by calculation in Django

Any solutions for custom calculation sorting in Django? I want to create a view that shows the Top Posts in my Blog. The ranking will be calculated by Post's attributes. Let's just say I have 3 IntegerFields called x, y, and z, and the ranking calculation will be x * y / z.
Any ideas? I would like to do Top Post ever, and also other variations filtered by time such as last 24 hours, 7 days, 1 month, etc.
Thanks!
You can use extra to retrieve extra calculated column(s) and sort by it:
MyModel.objects.filter(post_date__lt=#date#)
.extra(select={'custom_order': "x*y/z"}).order_by('custom_order')
The problem with this approach is that you're writing sql so it is not always portable across databases (although, for the example you supplied, this problem is avoided because it's a simple calculation)
Otherwise, you can do the sorting with pure python:
sorted_models = sorted(MyModel.objects.filter(post_date__lt=#date#)
, key=lambda my_model:my_model.x*my_model.y/my_model.z))
The extra() queryset method should allow you to do this. See the docs
As you can't order querysets by methods and properties in django you have to do the sorting in python.
Consider turning your calculated field into a property on your model and then you can do this in your view:
sorted_posts = sorted(Post.objects.all(), key=lambda post: post.calculated_field )
Finally you can pass sorted_posts to your list-template.