Django : how to optimize this query on a large table

Django : how to optimize this query on a large table - django

To give the context, I have a lot of temperature measurements taken at different stations and I want to check if it is in accordance with what was forecast.
My model is :
class Station(models.Model):
station_id = models.CharField(max_length = 18 ,primary_key = True)
sector = models.CharField(max_length = 40)
class Weather(models.Model):
station = models.ForeignKey(Station)
temperature = models.FloatField()
date = models.DateField()
class Forecast(models.Model):
station = models.ForeignKey(Station)
date = models.DateField()
score = models.IntegerField()
For each temperature measurement, I would like to know the average of the forecasting scores for the station over the last 7 days, unless there is another temperature measurement in this time frame, in which case it is the starting point. The following code does what I want but is much too slow to execute (~10minutes !) :
observations = Weather.objects.all().order_by('station','date')
for obs in observations:
try :
if obs.station == previous.station:
date_inf = min(obs.date- timedelta(days=7), previous.date)
else :
date_inf = obs.date- timedelta(days=7)
except UnboundLocalError :
date_inf = obs.date- timedelta(days=7)
forecast = Forecast.objects.filter(
station=obs.station
).filter(
date__gte = date_inf
).filter(
date__lte = obs.date - timedelta(days=1)
).aggregate(average_score=Avg('score'))
if forecast["average_score"] is not None:
print(forecast["average_score"],obs.rating)
# Some more code....
previous = obs
How can I optimize the execution time ? Is there a way to do it with a single query ?
Thanks !

For every measurement, you re-compute the average of last 7 days. If your measurements are closer together than 7 days, you will have overlap. E.g. if your measurements are 1 day apart then you re-calculate average on each object 6 times in the database which is SLOW.
Your best bet is to grab all measurements, then all forecasts that match, then do the averaging in memory in Python. Sure, more python code, but it will run faster.

Related

How to calculate sum of the difference between two dates

I have this model
class Exemple(models.Model):
from_date = models.DateField()
until_date = models.DateField()
person = models.ForeignKey(Person, on_delete=models.CASCADE)
I have a number per year, exemple 100 and I must to decrease that number from the sum of days of that person. I must to calculate day on every row of that person and then to make sum and then 100 - sum of days

Considering persons contains your persons, you could do something like that :
for person in persons:
sum = 0
for exemple in Exemple.objects.filter(person=person):
sum += max(1, exemple.until_date - exemple.from_date)
Explanation :
1) You do the computation person per person
2) For each person, you browse every exemple
3) You sum every "until - from". The max() is here to return 1 if until_date = from_date is equal to 0 (because you said you don't want it to be 0)
I don't know if you want to store it somewhere or if you just want to do it in a method so I just wrote this little sample of code to provide you the logic. You'll have to adapt it to suit your needs.
However this might not be the prettier way to achieve your goal.

Django, get sum of amount for the last day?

class Point(models.Model):
user = models.ForeignKey(User)
expire_date = models.DateField()
amount = models.IntegerField()
I want to know sum of amount for the last expire_date for a given user
There could be multiple points for a user and with same expire_date
I could do two query to get first last expire_date and aggregate on those. but wanna know if there's better way.

We can use a subquery here:
from django.db.models import Sum
Point.objects.filter(
expire_date__gte=Point.objects.order_by('-expire_date').values('expire_date')[:1]
).aggregate(total=Sum('amount'))
This will thus result in a query that looks like:
SELECT SUM(point.amount) AS total
FROM point
WHERE point.expire_date >= (
SELECT U0.expire_date
FROM point U0
ORDER BY U0.expire_date DESC
LIMIT 1
)
I have not ran performance tests on it, so I suggest you first try to measure if this will improve performance significantly.

How to use an aggregate in a case statement in Django

I am trying to use an aggregated column in a case statement in Django and I am having no luck getting Django to accept it.
The code is to return a list of people who have played a game, the number of times they have played the game and their total score. The list is sorted by total score descending. However, the game has a minimum number of plays in order to qualify. Players without sufficient plays are listed at the bottom. For example:
Player Total Plays
Jill 109 10
Sam 92 11
Jack 45 9
Sue 50 3
Sue is fourth in the list because her number of plays (3) is less than the minimum (5).
The relevant models and function are:
class Player(models.Model):
name = models.CharField()
class Game(models.Model):
name = models.CharField()
min_plays = models.IntegerField(default=1)
class Play(models.Model):
game = models.ForeignKey(Game)
class Score(models.Model):
play = models.ForeignKey(Play)
player = models.ForeignKey(Player)
score = models.IntegerField()
def game_standings(game):
query = Player.objects.filter(score__play__game_id=game.id)
query = query.annotate(plays=Count('score', filter=Q(score__play__game_id=self.id)))
query = query.annotate(total_score=Sum('score', filter=Q(score__play__game_id=self.id)))
query = query.annotate(sufficient=Case(When(plays__ge=game.minimum_plays, then=1), default=0)
query = query.order_by('-sufficient', '-total_score', 'plays')
When the last annotate method is hit, a "Unsupported lookup 'ge' for IntegerField or join on the field not permitted" error is reported. I tried to change the case statement to embed the count instead of using the annotated field:
query = query.annotate(
sufficient=Case(When(
Q(Count('score', filter=Q(score__play__game_id=game.id)))> 3, then=1), default=0
)
)
but Django reports a TypeError with '>' and Q and int.
The SQL I am trying to get to is:
SELECT "player"."id",
"player"."name",
COUNT("score"."id") FILTER (WHERE "play"."game_id" = 8) AS "plays",
SUM("score"."score") FILTER (WHERE "play"."game_id" = 8) AS "total_score",
case when COUNT("score"."id") FILTER (WHERE "play"."game_id" = 8) >= 5 then 1
else 0
end as sufficient
FROM "player"
LEFT OUTER JOIN "score" ON ("player"."id" = "score"."player_id")
LEFT OUTER JOIN "play" ON ("score"."play_id" = "play"."id")
WHERE "play"."game_id" = 8
GROUP BY "player"."id"
ORDER BY sufficent desc, total_score desc
I can't seem to figure out how to have the case statement use to play count.
Thanks

in Pyomo, How to write a constraint that has different time periods based on another parameter?

I'm working on a purchasing optimization model, below are some related inputs:
stets :
model.b = Set(initialize=Brands, doc='Brands')
model.s = Set(initialize=Suppliers, doc='Suppliers')
model.t = Set(initialize=Time , doc='Time in days')
parameters:
model.lt = Param(model.s, initialize=LeadTime, doc='Lead time to buy from supplier (s) in days')
variables:
model.q = Var(model.b, model.t, model.s, domain=NonNegativeIntegers, bounds=(0.0,None), doc='Recived quantity of each brand (b), at time (t), form supplier (s).')
model.pr = Var(model.b, model.t, model.s, domain=NonNegativeIntegers, bounds=(0.0,None), doc='Purshase Order quantity of each brand (b), at time (t), form supplier (s).')
I'm struggling to write a constraint that makes quantity ordered before (LT) days from time (t) = quantity received now at time (t), where LT is the lead time required by each supplier.
This is how I imagine the constraint but I don't know how to write it:
quantity ordered at time (t - lead time) = quantity received at time (t) , for all times (t), brands (b), and suppliers (s)
Your time and help are greatly appreciated!

Assuming that the time points in the model are all integers and that subtracting the lead time from one point in model.t will give another valid index of model.t, then the following should work:
def compute_received(m,b,t,s):
if t - m.lt[s] < min(m.t):
# Deliveries at this time would have to be placed before the beginning of the model
return Constraint.Skip
return m.q[b,t,s] == m.pr[b,t-m.lt[s],s]
model.compute_received = Constraint(model.b, model.t, model.s, rule=compute_received)

Improving Django performance with 350000+ regs and complex query

I have a model like this:
class Stock(models.Model):
product = models.ForeignKey(Product)
place = models.ForeignKey(Place)
date = models.DateField()
quantity = models.IntegerField()
I need to get the latest (by date) quantity for every product for every place,
with almost 500 products, 100 places and 350000 stock records on the database.
My current code is like this, it worked on testing but it takes so long with the real data that it's useless
stocks = Stock.objects.filter(product__in=self.products,
place__in=self.places, date__lt=date_at)
stock_values = {}
for prod in self.products:
for place in self.places:
key = u'%s%s' % (prod.id, place.id)
stock = stocks.filter(product=prod, place=place, date=date_at)
if len(stock) > 0:
stock_values[key] = stock[0].quantity
else:
try:
stock = stocks.filter(product=prod, place=place).order_by('-date')[0]
except IndexError:
stock_values[key] = 0
else:
stock_values[key] = stock.quantity
return stock_values
How would you make it faster?
Edit:
Rewrote the code as this:
stock_values = {}
for product in self.products:
for place in self.places:
try:
stock_value = Stock.objects.filter(product=product, place=place, date__lte=date_at)\
.order_by('-date').values('cant')[0]['cant']
except IndexError:
stock_value = 0
stock_values[u'%s%s' % (product.id, place.id)] = stock_value
return stock_values
It works better (from 256 secs to 64) but still need to improve it. Maybe some custom SQL, I don't know...

Arthur's right, the len(stock) isn't the most efficient way to do that. You could go further along the "easier to ask for forgiveness than permission" route with something like this inside the inner loop:
key = u'%s%s' % (prod.id, place.id)
try:
stock = stocks.filter(product=prod, place=place, date=date_at)[0]
quantity = stock.quantity
except IndexError:
try:
stock = stocks.filter(product=prod, place=place).order_by('-date')[0]
quantity = stock.quantity
except IndexError:
quantity = 0
stock_values[key] = quantity
I'm not sure how much that would improve it compared to just changing the length check, though I think this should at least restrict it to two queries with LIMIT 1 on them (see Limiting QuerySets).
Mind you, this is still performing a lot of database hits since you could run through that loop almost 50000 times. Optimize how you're looping and you're in a better position still.

maybe the trick is in that len() method!
follow docs from:
Note: Don't use len() on QuerySets if all you want to do is determine
the number of records in the set. It's much more efficient to handle a
count at the database level, using SQL's SELECT COUNT(*), and Django
provides a count() method for precisely this reason. See count()
below.
So try changing the len to count(), and see if it makes faster!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django : how to optimize this query on a large table - django

Related

How to calculate sum of the difference between two dates

Django, get sum of amount for the last day?

How to use an aggregate in a case statement in Django

in Pyomo, How to write a constraint that has different time periods based on another parameter?

Improving Django performance with 350000+ regs and complex query

Categories

Resources