How to use posgresql 'interval' in Django? - django

Here is my PostgreSQL statement.
select round(sum("amount") filter(where "date">=now()-interval '12 months')/12,0) as avg_12month from "amountTab"
How to use this in Django?
I have an object called 'Devc', with attribute 'date'.
I want to get the sum of the specific data within past 12 months, not past 365 days.

You can try this to get the data within the past 12 months.
today= datetime.now()
current_month_first_day = today.replace(day = 1)
previous_month_last_day = current_month_first_day - timedelta(days = 1)
past_12_month_first_day = previous_month_last_day - timedelta(days = 360)
past_12_month_first_day = past_12_month_first_day.replace(day = 1)
past_12_month_avg = Devc.objects.filter(date__range=(past_12_month_first_day,current_month_first_day)).aggregate(Sum('amount'))['amount']

Related

Django ORM. Joining subquery on condition

I have a table TickerStatement, which contains financial statements about companies
class Statements(models.TextChoices):
"""
Supported statements
"""
capital_lease_obligations = 'capital_lease_obligations'
net_income = 'net_income'
price = 'price'
total_assets = 'total_assets'
short_term_debt = 'short_term_debt'
total_long_term_debt = 'total_long_term_debt'
total_revenue = 'total_revenue'
total_shareholder_equity = 'total_shareholder_equity'
class TickerStatement(TimeStampMixin):
"""
Model that represents ticker financial statements
"""
name = models.CharField(choices=Statements.choices, max_length=50)
fiscal_date_ending = models.DateField()
value = models.DecimalField(max_digits=MAX_DIGITS, decimal_places=DECIMAL_PLACES)
ticker = models.ForeignKey(Ticker, on_delete=models.CASCADE, null=False,
related_name='ticker_statements')
And now I'm trying to calculate a multiplier. The formula looks like:
(short_term_debt + total_long_term_debt) / total_shareholder_equity
I wrote a raw SQL query
SELECT "fin_tickerstatement"."fiscal_date_ending",
t2.equity AS "equity",
value AS "debt",
short_term_debt AS "short_term_debt",
(value + short_term_debt) / t2.equity AS "result"
FROM "fin_tickerstatement"
JOIN
(SELECT "fin_tickerstatement"."fiscal_date_ending",
fin_tickerstatement.value AS "equity"
FROM "fin_tickerstatement"
WHERE ("fin_tickerstatement"."ticker_id" = 12
AND "fin_tickerstatement"."fiscal_date_ending" >= date'2015-09-03'
AND "fin_tickerstatement"."name" = 'total_shareholder_equity')
GROUP BY "fin_tickerstatement"."fiscal_date_ending",
fin_tickerstatement.value
ORDER BY "fin_tickerstatement"."fiscal_date_ending" DESC) t2
ON fin_tickerstatement.fiscal_date_ending = t2.fiscal_date_ending
JOIN
(SELECT "fin_tickerstatement"."fiscal_date_ending",
fin_tickerstatement.value AS "short_term_debt"
FROM "fin_tickerstatement"
WHERE ("fin_tickerstatement"."ticker_id" = 12
AND "fin_tickerstatement"."fiscal_date_ending" >= date'2015-09-03'
AND "fin_tickerstatement"."name" = 'short_term_debt')
GROUP BY "fin_tickerstatement"."fiscal_date_ending",
fin_tickerstatement.value
ORDER BY "fin_tickerstatement"."fiscal_date_ending" DESC) t3
ON fin_tickerstatement.fiscal_date_ending = t3.fiscal_date_ending
WHERE ("fin_tickerstatement"."ticker_id" = 12
AND "fin_tickerstatement"."fiscal_date_ending" >= date'2015-09-03'
AND "fin_tickerstatement"."name" = 'total_long_term_debt')
GROUP BY "fin_tickerstatement"."fiscal_date_ending",
equity,
debt,
short_term_debt
ORDER BY "fin_tickerstatement"."fiscal_date_ending" DESC;
and have no idea how to translate it into Django ORM. Maybe you have some ideas or know some Django plugins that can help me.
The only way to solve this problem is to install django-query-builder.

average spending per day - django model

I have a model that looks something like that:
class Payment(TimeStampModel):
timestamp = models.DateTimeField(auto_now_add=True)
amount = models.FloatField()
creator = models.ForeignKey(to='Payer')
What is the correct way to calculate average spending per day?
I can aggregate by day, but then the days when a payer does not spend anything won't count, which is not correct
UPDATE:
So, let's say I have only two records in my db, one from March 1, and one from January 1. The average spending per day should be something
(Sum of all spendings) / (March 1 - January 1)
that is divided by 60
however this of course give me just an average spending per item, and number of days will give me 2:
for p in Payment.objects.all():
print(p.timestamp, p.amount)
p = Payment.objects.all().dates('timestamp','day').aggregate(Sum('amount'), Avg('amount'))
print(p
Output:
2019-03-05 17:33:06.490560+00:00 456.0
2019-01-05 17:33:06.476395+00:00 123.0
{'amount__sum': 579.0, 'amount__avg': 289.5}
You can aggregate min and max timestamp and the sum of amount:
from django.db.models import Min, Max, Sum
def average_spending_per_day():
aggregate = Payment.objects.aggregate(Min('timestamp'), Max('timestamp'), Sum('amount'))
min_datetime = aggregate.get('timestamp__min')
if min_datetime is not None:
min_date = min_datetime.date()
max_date = aggregate.get('timestamp__max').date()
total_amount = aggregate.get('amount__sum')
days = (max_date - min_date).days + 1
return total_amount / days
return 0
If there is a min_datetime then there is some data in the db table, and there is also max date and total amount, otherwise we return 0 or whatever you want.
It depends on your backend, but you want to divide the sum of amount by the difference in days between your max and min timestamp. In Postgres, you can simply subtract two dates to get the number of days between them. With MySQL there is a function called DateDiff that takes two dates and returns the number of days between them.
class Date(Func):
function = 'DATE'
class MySQLDateDiff(Func):
function = 'DATEDIFF'
def __init__(self, *expressions, **extra):
expressions = [Date(exp) for exp in expressions]
extra['output_field'] = extra.get('output_field', IntegerField())
super().__init__(*expressions, **extra)
class PgDateDiff(Func):
template = "%(expressions)s"
arg_joiner = ' - '
def __init__(self, *expressions, **extra):
expressions = [Date(exp) for exp in expressions]
extra['output_field'] = extra.get('output_field', IntegerField())
super().__init__(*expressions, **extra)
agg = {
avg_spend: ExpressionWrapper(
Sum('amount') / (PgDateDiff(Max('timestamp'), Min('timestamp')) + Value(1)),
output_field=DecimalField())
}
avg_spend = Payment.objects.aggregate(**agg)
That looks roughly right to me, of course, I haven't tested it. Of course, use MySQLDateDiff if that's your backend.

Python script | long running | Need suggestions to optimize

I have written this script to generate a dataset which would contain 15 minute time intervals based on the inputs provided for operational hours for all days of a week for 365 days.
example: Let us say Store 1 opens at 9 AM and closes at 9 PM on all days. That is 12 hours everyday. 12*4 = 48(15 minute periods a day). 48 * 365 = 17520 (15 minute periods for a year).
The sample dataset only contains 5 sites but there are about 9000 sites that this script needs to generate data for.
The script obviously runs for a handful of sites(100) and couple of days(2) but needs to run for sites(9000) and 365 days.
Looking for suggestions to make this run faster. This will be running on a local machine.
input data: https://drive.google.com/open?id=1uLYRUsJ2vM-TIGPvt5RhHDhTq3vr4V2y
output data: https://drive.google.com/open?id=13MZCQXfVDLBLFbbmmVagIJtm6LFDOk_T
Please let me know if I can help with anything more to get this answered.
def datetime_range(start, end, delta):
current = start
while current < end:
yield current
current += delta
import pandas as pd
import numpy as np
import cProfile
from datetime import timedelta, date, datetime
#inputs
empty_data = pd.DataFrame(columns=['store','timestamp'])
start_dt = date(2019, 1, 1)
days = 365
data = "input data | attached to the post"
for i in range(days):
for j in range(len(data.store)):
curr_date = start_dt + timedelta(days=i)
curr_date_year = curr_date.year
curr_date_month = curr_date.month
curr_date_day = curr_date.day
weekno = curr_date.weekday()
if weekno<5:
dts = [dt.strftime('%Y-%m-%d %H:%M') for dt in
datetime_range(datetime(curr_date_year,curr_date_month,curr_date_day,data['m_f_open_hrs'].iloc[j],data['m_f_open_min'].iloc[j]), datetime(curr_date_year,curr_date_month,curr_date_day, data['m_f_close_hrs'].iloc[j],data['m_f_close_min'].iloc[j]),
timedelta(minutes=15))]
vert = pd.DataFrame(dts,columns = ['timestamp'])
vert['store']= data['store'].iloc[j]
empty_data = pd.concat([vert, empty_data])
elif weekno==5:
dts = [dt.strftime('%Y-%m-%d %H:%M') for dt in
datetime_range(datetime(curr_date_year,curr_date_month,curr_date_day,data['sat_open_hrs'].iloc[j],data['sat_open_min'].iloc[j]), datetime(curr_date_year,curr_date_month,curr_date_day, data['sat_close_hrs'].iloc[j],data['sat_close_min'].iloc[j]),
timedelta(minutes=15))]
vert = pd.DataFrame(dts,columns = ['timestamp'])
vert['store']= data['store'].iloc[j]
empty_data = pd.concat([vert, empty_data])
else:
dts = [dt.strftime('%Y-%m-%d %H:%M') for dt in
datetime_range(datetime(curr_date_year,curr_date_month,curr_date_day,data['sun_open_hrs'].iloc[j],data['sun_open_min'].iloc[j]), datetime(curr_date_year,curr_date_month,curr_date_day, data['sun_close_hrs'].iloc[j],data['sun_close_min'].iloc[j]),
timedelta(minutes=15))]
vert = pd.DataFrame(dts,columns = ['timestamp'])
vert['store']= data['store'].iloc[j]
empty_data = pd.concat([vert, empty_data])
final_data = empty_data
I think the most time consuming tasks in your script are the datetime calculations.
You should try to make all of those calculations using UNIX Time. It basically represents time as an integer that counts seconds... so you could take two UNIX dates and see the difference just by doing simple subtraction.
In my opinion you should perform all the operations like that... and when the process has finished you can make all the datetime conversions to a more readable date format.
Other thing that you should change in your script is all the code repetition that is almost identical. It won't improve the performance, but it improves readability, debugging and your skills as a programmer. As a simple example I have refactored some of the code (you probably can do better than what I did, but this is just an example).
def datetime_range(start, end, delta):
current = start
while current < end:
yield current
current += delta
from datetime import timedelta, date, datetime
import numpy as np
import cProfile
import pandas as pd
# inputs
empty_data = pd.DataFrame(columns=['store', 'timestamp'])
start_dt = date(2019, 1, 1)
days = 365
data = "input data | attached to the post"
for i in range(days):
for j in range(len(data.store)):
curr_date = start_dt + timedelta(days=i)
curr_date_year = curr_date.year
curr_date_month = curr_date.month
curr_date_day = curr_date.day
weekno = curr_date.weekday()
week_range = 'sun'
if weekno < 5:
week_range = 'm_f'
elif weekno == 5:
week_range = 'sat'
first_time = datetime(curr_date_year,curr_date_month,curr_date_day,data[week_range + '_open_hrs'].iloc[j],data[week_range + '_open_min'].iloc[j])
second_time = datetime(curr_date_year,curr_date_month,curr_date_day, data[week_range + '_close_hrs'].iloc[j],data[week_range + '_close_min'].iloc[j])
dts = [ dt.strftime('%Y-%m-%d %H:%M') for dt in datetime_range(first_time, second_time, timedelta(minutes=15)) ]
vert = pd.DataFrame(dts, columns = ['timestamp'])
vert['store']= data['store'].iloc[j]
empty_data = pd.concat([vert, empty_data])
final_data = empty_data
Good luck!

Filter the data for particular 15 days of all years in django

I am trying to print the data for particular 15 days of every year.
For example to get the Employee's details who has birthdays with in 15 days.
today = datetime.now()
start_day = today.day
start_month = today.month
end_day = today + timedelta(days=15)
end_date = end_day.day
end_month = end_day.month
user_dob_obj = UserProfile.objects.filter(Q(date_of_birth__month__gte=start_month, date_of_birth__day__gte=start_day) &
Q(date_of_birth__month__lte=end_month, date_of_birth__day__lte=end_date))
Update
Sorry I misunderstood your question. You can use if statement to check if the month is the same 15 days later. Then use the or logical operation to make sure birthdays in current and next month are filtered.
today = datetime.now()
end_date = today + timedelta(days=15)
if today.month == end_date.month:
user_dob_obj = user_dob_obj.filter(date_of_birth__month=today.month, date_of_birth__day__gte=today.day, date_of_birth__day__lte=end_date.day)
else:
user_dob_obj = queryset.filter(Q(date_of_birth__month=today.month, date_of_birth__day__gte=today.day) | Q(date_of_birth__month=end_date.month, date_of_birth__day__lte=end_date.day))

Number of events occurred on each hour in particular day

I have model containing "caller_name" and "call_datetime" field.
I was able to get number of calls occurred on each day in particular month:
while start_date <= end_date:
calls = CDR.objects.filter(start_time__year=str(start_date.year), start_time__month=str(start_date.month),
start_time__day=str(start_date.day))
print "Number of calls:", len(calls)
start_date = start_date + datetime.timedelta(days=1)
Similarlly, I tried to get number of calls on each hour in particular date.
for i in range(24):
calls = CDR.objects.filter(start_time__year=str(start_date.year), start_time__month=str(start_date.month),start_time__day=str(start_date.day), start_time__hour=str(i))
Found out that "start_time__hour" is not implemented, but in their any way to achieve this?
Try this workaround:
day_calls = CDR.objects.filter(start_time__year=str(start_date.year), start_time__month=str(start_date.month),start_time__day=str(start_date.day))
hour_calls = day_calls.extra(select={'hours': 'DATE_FORMAT(start_date, "%%H")'})\
.values_list('hours', flat=True)\
.distinct()\
.order_by('hours')
You could either use raw SQL with the .extra() method or something like this:
for i in range(24):
dt1 = start_time.replace(hour=i)
dt2 = dt1 + datetime.timedelta(hours=1)
calls = CDR.objects.filter(start_time__gte=dt1, start_time__lt=dt2)