class Point(models.Model):
user = models.ForeignKey(User)
expire_date = models.DateField()
amount = models.IntegerField()
I want to know sum of amount for the last expire_date for a given user
There could be multiple points for a user and with same expire_date
I could do two query to get first last expire_date and aggregate on those. but wanna know if there's better way.
We can use a subquery here:
from django.db.models import Sum
Point.objects.filter(
expire_date__gte=Point.objects.order_by('-expire_date').values('expire_date')[:1]
).aggregate(total=Sum('amount'))
This will thus result in a query that looks like:
SELECT SUM(point.amount) AS total
FROM point
WHERE point.expire_date >= (
SELECT U0.expire_date
FROM point U0
ORDER BY U0.expire_date DESC
LIMIT 1
)
I have not ran performance tests on it, so I suggest you first try to measure if this will improve performance significantly.
Related
Context
There is a dataframe of customer invoices and their due dates.(Identified by customer code)
Week(s) need to be added depending on customer code
Model is created to persist the list of customers and week(s) to be added
What is done so far:
Models.py
class BpShift(models.Model):
bp_name = models.CharField(max_length=50, default='')
bp_code = models.CharField(max_length=15, primary_key=True, default='')
weeks = models.IntegerField(default=0)
helper.py
from .models import BpShift
# used in views later
def week_shift(self, df):
df['DueDateRange'] = df['DueDate'] + datetime.timedelta(
weeks=BpShift.objects.get(pk=df['BpCode']).weeks)
I realised my understanding of Dataframes is seriously flawed.
df['A'] and df['B'] would return Series. Of course, timedelta wouldn't work like this(weeks=BpShift.objects.get(pk=df['BpCode']).weeks).
Dataframe
d = {'BpCode':['customer1','customer2'],'DueDate':['2020-05-30','2020-04-30']}
df = pd.DataFrame(data=d)
Customer List csv
BP Name,BP Code,Week(s)
Customer1,CA0023MY,1
Customer2,CA0064SG,1
Error
BpShift matching query does not exist.
Commentary
I used these methods in hope that I would be able to change the dataframe at once, instead of
using df.iterrows(). I have recently been avoiding for loops like a plague and wondering if this
is the "correct" mentality. Is there any recommended way of doing this? Thanks in advance for any guidance!
This question Python & Pandas: series to timedelta will help to take you from Series to timedelta. And although
pandas.Series(
BpShift.objects.filter(
pk__in=df['BpCode'].tolist()
).values_list('weeks', flat=True)
)
will give you a Series of integers, I doubt the order is the same as in df['BpCode']. Because it depends on the django Model and database backend.
So you might be better off to explicitly create not a Series, but a DataFrame with pk and weeks columns so you can use df.join. Something like this
pandas.DataFrame(
BpShift.objects.filter(
pk__in=df['BpCode'].tolist()
).values_list('pk', 'weeks'),
columns=['BpCode', 'weeks'],
)
should give you a DataFrame that you can join with.
So combined this should be the gist of your code:
django_response = [('customer1', 1), ('customer2', '2')]
d = {'BpCode':['customer1','customer2'],'DueDate':['2020-05-30','2020-04-30']}
df = pd.DataFrame(data=d).set_index('BpCode').join(
pd.DataFrame(django_response, columns=['BpCode', 'weeks']).set_index('BpCode')
)
df['DueDate'] = pd.to_datetime(df['DueDate'])
df['weeks'] = pd.to_numeric(df['weeks'])
df['new_duedate'] = df['DueDate'] + df['weeks'] * pd.Timedelta('1W')
print(df)
DueDate weeks new_duedate
BpCode
customer1 2020-05-30 1 2020-06-06
customer2 2020-04-30 2 2020-05-14
You were right to want to avoid looping. This approach gets all the data in one SQL query from your Django model, by using filter. Then does a left join with the DataFrame you already have. Casts the dates and weeks to the right types and then computes a new due date using the whole columns instead of loops over them.
NB the left join will give NaN and NaT for customers that don't exist in your Django database. You can either avoid those rows by passing how='inner' to df.join or handle them whatever way you like.
I have the following model:
class Pick(models.Model):
league = models.ForeignKey(League)
user = models.ForeignKey(User)
team = models.ForeignKey(Team)
week = models.IntegerField()
result = models.IntegerField(default=3, help_text='loss=0, win=1, tie=2, not started=3, in progress=4')
I'm trying to get generate a standings table based off of the results, but I'm unsure how to get it done in a single query. I'm interested in getting, for each user in a particular league, a count of the results that = 1 (as win), 0 (as loss) and 2 as tie). The only thing I can think of is to do 3 separate queries where I filter the results and then annotate like so:
Pick.objects.filter(league=2, result=1).annotate(wins=Count('result'))
Pick.objects.filter(league=2, result=0).annotate(losses=Count('result'))
Pick.objects.filter(league=2, result=2).annotate(ties=Count('result'))
Is there a more efficient way to achieve this?
Thanks!
The trick to this is to use the values method to just select the fields you want to aggregate on.
Pick.objects.filter(league=2).values('result').aggregate(wins=Count('result'))
I have such model and query
class Employer(Models.model)
name = ...
class JobTitle(Models.model)
name = ...
employer = models.ForeignKey(Employer)
and query is
Employer.objects.select_related('jobtitle')
.filter(jtt__activatedate__range=[startdate,enddate])
.annotate(jtt_count=Count('jobtitle'))
.order_by('-jtt_count')[:5]
As you see it returns 5 employer list which has maximum number of jobtitles which are related to that employer and whose activation date is in some certain range.
However, I also want to get the total number of jobtitles of each employer in that query.
Of course I may loop over each employer and make such query JobTitle.objects.filter(employer = emp) and taking length of that query but it is bad solution.
How can I achive this in that query?
Although it may not be possible to get both total number and filtered number of job titles, I may get the jobttiles of each emplyoer such that len(emp.jobtitle) however it also didn't work.
Thanks
Try the extra lookup. So, in your case it may be like this:
.extra(
select={
'jobtitle_count': 'SELECT COUNT(*) FROM YOURAPP_jobtitle WHERE YOURAPP_jobtitle.employer_id = YOURAPP_employer.id'
},
)
In a Django application I need to create an order number which looks like: yyyymmddnnnn in which yyyy=year, mm=month, dd=day and nnnn is a number between 1 and 9999.
I thought I could use a PostgreSQL sequence since the generated numbers are atomic, so I can be sure when the process gets a number that number is unique.
So I created a PostgreSQL sequence:
CREATE SEQUENCE order_number_seq
INCREMENT 1
MINVALUE 1
MAXVALUE 9999
START 1
CACHE 1
CYCLE;
This sequence can be accessed as a tables having one row. So in the file checkout.py I created a Django model to access this sequence.
class OrderNumberSeq(models.Model):
"""
This class maps to OrderNumberSeq which is a PostgreSQL sequence.
This sequence runs from 1 to 9999 after which it restarts (cycles) at 1.
A sequence is basically a special single row table.
"""
sequence_name = models.CharField(max_length=128, primary_key=True)
last_value = models.IntegerField()
increment_by = models.IntegerField()
max_value = models.IntegerField()
min_value = models.IntegerField()
cache_value = models.IntegerField()
log_cnt = models.IntegerField()
is_cycled = models.BooleanField()
is_called = models.BooleanField()
class Meta:
db_table = u'order_number_seq'
I set the sequence_name as primary key as Django insists on having a primary key in a table.
The I created a file get_order_number.py with the contents:
def get_new_order_number():
order_number = OrderNumberSeq.objects.raw("select sequence_name, nextval('order_number_seq') from order_number_seq")[0]
today = datetime.date.today()
year = u'%4s' % today.year
month = u'%02i' % today.month
day = u'%02i' % today.day
new_number = u'%04i' % order_number.nextval
return year+month+day+new_number
now when I call 'get_new_order_number()' from the django interactive shell it behaves as expected.
>>> checkout.order_number.get_new_order_number()
u'201007310047'
>>> checkout.order_number.get_new_order_number()
u'201007310048'
>>> checkout.order_number.get_new_order_number()
u'201007310049'
You see the numbers nicely incrementing by one every time the function is called. You can start multiple interactive django sessions and the numbers increment nicely with no identical numbers appearing in the different sessions.
Now I try to use call this function from a view as follows:
import get_order_number
order_number = get_order_number.get_new_order_number()
and it gives me a number. However next time I access the view, it increments the number by 2. I have no idea where the problem is.
The best solution I can come up with is: don't worry if your order numbers are sparse. It should not matter if an order number is missing: there is no way to ensure that order numbers are contiguous that will not be subject to a race condition at some point.
Your biggest problem is likely to be convincing the pointy-haired ones that having 'missing' order numbers is not a problem.
For more details, see the Psuedo-Key Neat Freak entry in SQL Antipatterns. (note, this is a link to a book, which the full text of is not available for free).
Take a look at this question/answer Custom auto-increment field in postgresql (Invoice/Order No.)
You can create stored procedures using RawSql migration.
I have a report model looking a bit like this:
class Report(models.Model):
date = models.DateField()
quantity = models.IntegerField()
product_name = models.TextField()
I know I can get the last entry for the last year for one product this way:
Report.objects.filter(date__year=2009, product_name="corn").order_by("-date")[0]
I know I can group entries by name this way:
Report.objects.values("product_name")
But how can I get the quantity for the last entry for each product ? I feel like I would do it this way in SQL (not sure, my SQL is rusty):
SELECT product_name, quantity FROM report WHERE YEAR(date) == 2009 GROUP_BY product_name HAVING date == Max(date)
My guess is to use the Max() object with annotate, but I have no idea how to.
For now, I do it by manually adding the last item of each query for each product_name I cant list with a distinct.
Not exactly a trivial query using either the Django ORM or SQL. My first take on it would be to pretty much what you are probably already doing; get the distinct product and date pairs and then perform individual queries for each of those.
year_products = Product.objects.filter(year=2009)
product_date_pairs = year_products.values('product').distinct('product'
).annotate(Max('date'))
[Report.objects.get(product=p['product'], date=p['date__max'])
for p in product_date_pairs]
But you can take it a step further with the Q operator and some fancy OR'ing to trim your query count down to 2 instead of N + 1.
import operator
qs = [Q(product=p['product'], date=p['date__max']) for p in product_date_pairs]
ored_qs = reduce(operator.or_, qs)
Report.objects.filter(ored_qs)