Django SUM from INNER JOIN - django

I have a DB like this:
class MyCPU(models.Model):
cpu_name = models.CharField(max_length=100)
cpu_count = models.IntegerField()
class MyMachine(models.Model):
hostname = models.CharField(max_length=50)
ip = models.CharField(max_length=50)
cpu = models.ForeignKey(CPU, on_delete=models.CASCADE)
How can I achieve the result of following raw SQL command in Django ?
select sum(cpu_count) as sum_cpu from my_machine inner join my_cpu on my_machine.cpu_id=my_cpu.id
I basically want to sum how many CPU in all of machines.
I have tried this solution but it did not work
Machine.objects.annotate(total_cpu=Sum('cpu__cpu_count'))

Since you are using foreign key, You can do
MyMachine.objects.values('hostname', 'ip', 'cpu__cpu_count')
This will get each Machine how many cpu`s.
If you need total number of cpu`s
MyCPU.objects.aggregate(total_cpu=Sum('cpu_count'))['total_cpu']
If there are unconnected CPU objects, you can do following to get sum from all machines,
MyMachine.objects.aggregate(total_cpu=Sum('cpu__cpu_count'))['total_cpu']
I think the last one is you are searching for since there is chance of same CPU object in different machines.

Related

Simplifying Django Query Annotation

I've 3 models and a function is called many times, but it generates 200-300 sql queries, and I'd like to reduce this number.
I've the following layout:
class Info(models.Model):
user = models.ForeignKey(User)
...
class Forum(models.Model):
info = models.ForeignKey(Info)
datum = models.DateTimeField()
...
class InfoViewed(models.Model):
user = models.ForeignKey(User)
infokom = models.ForeignKey(Info)
last_seen = models.DateTimeField()
...
. I need to get all the number of new Forum messages, so only a number. At the moment it works so that I iterate over all the related Infokoms and I summerize all Forums having higher datum than the related InfoViewed's last_seen field.
This works, however results ~200 queries for 100 Infos.
Is there any possibility to fetch the same number within a few queries? I tried to play with annonate and django.db.models.Count, but I failed.
Django: 1.11.16
Currently I'm using this:
infos = Info.objects.filter(user_id=**x**)
return sum(i.number_of_new_forums(self.user) \
for i in infos)
and the number_of_new_forums looks like this:
info_viewed = self.info_viewed_set.filter(user=user)
return len([f.id for f in self.get_forums().\
filter(datum__gt = info_viewed[0].last_seen)])
I managed to come up with some solution from a different perspective:
Forum.objects.filter(info_id__in=info_ids,
info__info_viewed__user=request.user,
datum__gt=F('info__info_viewed__last_seen')).count()
where info_ids are all related Infos' id (list), however i'm unsure if it's a 100% procent solution... .
If someone might have a different approach I'd welcome it.

Django - creating and saving multiple object in a loop, with ForeignKeys

I am having trouble creating and saving objects in Django. I am very new to Django so I'm sure I'm missing something very obvious!
I am building a price comparison app, and I have a Search model:
Search - all searches carried out, recording best price, worst price, product searched for, time of search etc. I have successfully managed to save these searches to a DB and am happy with this model.
The two new models I am working with are:
Result - this is intended to record all search results returned, for each search carried out. I.e. Seller 1 £100, Seller 2 £200, Seller 3, £300. (One search has many search results).
'Agent' - a simple table of Agents that I compare prices at. (One Agent can have many search Results).
class Agent(models.Model):
agent_id = models.AutoField(primary_key=True)
agent_name = models.CharField(max_length=30)
class Result(models.Model):
search_id = models.ForeignKey(Search, on_delete=models.CASCADE) # Foreign Key of Search table
agent_id = models.ForeignKey(Agent, on_delete=models.CASCADE) # Foreign Key of Agent table
price = models.FloatField()
search_position = models.IntegerField().
My code that is creating and saving the objects is here:
def update_search_table(listed, product):
if len(listed) > 0:
search = Search(product=product,
no_of_agents=len(listed),
valid_search=1,
best_price=listed[0]['cost'],
worst_price=listed[-1]['cost'])
search.save()
for i in range(len(listed)):
agent = Agent.objects.get(agent_name = listed[i]['company'])
# print(agent.agent_id) # Prints expected value
# print(search.search_id) # Prints expected value
# print(listed[i]['cost']) # Prints expected value
# print(i + 1) # Prints expected value
result = Result(search_id = search,
agent_id = agent,
price = listed[i]['cost'],
position = i + 1)
search.result_set.add(result)
agent.result_set.add(result)
result.save()
Up to search.save() is working as expected.
The first line of the for loop is also correctly retrieving the relevant Agent.
The rest of it is going wrong (i.e. not saving any Result objects to the Result table). What I want to achieve is, if there are 10 different agent results returned, create 10 Result objects and save each one. Link each of those 10 objects to the Search that triggered the results, and link each of those 10 objects to the relevant Agent.
Have tried quite a few iterations but not sure where I'm going wrong.
Thanks

How can I make this queryset more tolerable?

I have the following class:
class Instance(models.Model):
products = models.ManyToManyField(Product, blank=True)
class Product(models.Model):
description = HTMLField(blank=True, null=True)
short_description = HTMLField(blank=True, null=True)
And this form that I use to update Instances
class InstanceModelForm(InstanceValidatorMixin, UpdateInstanceLastUpdatedMixin, forms.ModelForm):
class Meta:
model = Instance
products = forms.ModelMultipleChoiceField(required=False, queryset=Product.objects.annotate(i_count=Count('instance')).order_by('i_count'))
My instance-product table is sizable (~ 1000 rows) and ever since I've added the queryset for products I am seeing web requests that are timing out due heroku's 30 second request limit.
My goal is to do something to this queryset such that my users are no longer timing out.
I have the following insights:
Accuracy doesn't matter as much to me - It doesn't have to be very accurate. Yes I would like to sort products by the count of instances this product has linked to but if it's off by 5 or 10 it doesn't really matter
that much.
Limited number of products - When my users are selecting products to be linked to an instance, they are primarily interested in products with less than 10 total linkages to instances. I don't know if a partial query will be accurate, but if this is possible I am open to trying.
Effort - I know there are frameworks out there that I can install to cache many things. I am looking for something that is light weight and requires less than 1 hr to get up and running.
First I would want to ensure that the performance issue actually comes from the query. I've tried to reproduce your problem:
>>> Instance.objects.count()
102499
>>> Product.objects.count()
1000
>>> sum(p.instance_set.count() for p in Product.objects.all())/Product.objects.count()
273.084
>>> list(Product.objects.annotate(i_count=Count('instance')).order_by('i_count'))
[...]
>>> from django.db import connection
>>> connection.queries[-1]
{'sql': 'SELECT "products_product"."id", "products_product"."description", "products_product"."short_description", COUNT("products_instance_products"."instance_id") AS "i_count" FROM "products_product" LEFT OUTER JOIN "products_instance_products" ON ("products_product"."id" = "products_instance_products"."product_id") GROUP BY "products_product"."id", "products_product"."description", "products_product"."short_description" ORDER BY "i_count" ASC', 'time': '0.189'}
By accident, I created a dataset that is probably quite a bit bigger than yours. As you can see, I have 1000 Products with an average of ~273 related Instances, but the query still takes less than a second (both on SQLite and PostgreSQL).
Use a one-off dyno with heroku run bash and check if you get the same numbers.
My guess is that your performance issues are either caused by
an n+1 query, where an extra query is made for each Product, e.g. in your Product.__str__ method.
the actual rendering of the MultipleChoiceField field. By default, it will render as a <select> with an <option> for each Product. This can be quite slow, and even it wasn't, it would pretty inconvenient to use. You might want to use a different widget, like django-select2.

Django: efficient semi-random order_by for user-friendly results?

I have a Django search app with a Postgres back-end that is populated with cars. My scripts load on a brand-by-brand basis: let's say a typical mix is 50 Chryslers, 100 Chevys, and 1500 Fords, loaded in that order.
The default ordering is by creation date:
class Car(models.Model):
name = models.CharField(max_length=500)
brand = models.ForeignKey(Brand, null=True, blank=True)
transmission = models.CharField(max_length=50, choices=TRANSMISSIONS)
created = models.DateField(auto_now_add=True)
class Meta:
ordering = ['-created']
My problem is this: typically when the user does a search, say for a red automatic, and let's say that returns 10% of all cars:
results = Cars.objects.filter(transmission="automatic", color="red")
the user typically gets hundreds of Fords first before any other brand (because the results are ordered by date_added) which is not a good experience.
I'd like to make sure the brands are as evenly distributed as possible among the early results, without big "runs" of one brand. Any clever suggestions for how to achieve this?
The only idea I have is to use the ? operator with order_by:
results = Cars.objects.filter(transmission="automatic", color="red").order_by("?")
This isn't ideal. It's expensive. And it doesn't guarantee a good mix of results, if some brands are much more common than others - so here where Chrysler and Chevy are in the minority, the user is still likely to see lots of Fords first. Ideally I'd show all the Chryslers and Chevys in the first 50 results, nicely mixed in with Fords.
Any ideas on how to achieve a user-friendly ordering? I'm stuck.
What I ended up doing was adding a priority field on the model, and assigning a semi-random integer to it.
By semi-random, I mean random within a particular range: so 1-100 for Chevy and Chrysler, and 1-500 for Ford.
Then I used that field to order_by.

Django, accessing PostgreSQL sequence

In a Django application I need to create an order number which looks like: yyyymmddnnnn in which yyyy=year, mm=month, dd=day and nnnn is a number between 1 and 9999.
I thought I could use a PostgreSQL sequence since the generated numbers are atomic, so I can be sure when the process gets a number that number is unique.
So I created a PostgreSQL sequence:
CREATE SEQUENCE order_number_seq
INCREMENT 1
MINVALUE 1
MAXVALUE 9999
START 1
CACHE 1
CYCLE;
This sequence can be accessed as a tables having one row. So in the file checkout.py I created a Django model to access this sequence.
class OrderNumberSeq(models.Model):
"""
This class maps to OrderNumberSeq which is a PostgreSQL sequence.
This sequence runs from 1 to 9999 after which it restarts (cycles) at 1.
A sequence is basically a special single row table.
"""
sequence_name = models.CharField(max_length=128, primary_key=True)
last_value = models.IntegerField()
increment_by = models.IntegerField()
max_value = models.IntegerField()
min_value = models.IntegerField()
cache_value = models.IntegerField()
log_cnt = models.IntegerField()
is_cycled = models.BooleanField()
is_called = models.BooleanField()
class Meta:
db_table = u'order_number_seq'
I set the sequence_name as primary key as Django insists on having a primary key in a table.
The I created a file get_order_number.py with the contents:
def get_new_order_number():
order_number = OrderNumberSeq.objects.raw("select sequence_name, nextval('order_number_seq') from order_number_seq")[0]
today = datetime.date.today()
year = u'%4s' % today.year
month = u'%02i' % today.month
day = u'%02i' % today.day
new_number = u'%04i' % order_number.nextval
return year+month+day+new_number
now when I call 'get_new_order_number()' from the django interactive shell it behaves as expected.
>>> checkout.order_number.get_new_order_number()
u'201007310047'
>>> checkout.order_number.get_new_order_number()
u'201007310048'
>>> checkout.order_number.get_new_order_number()
u'201007310049'
You see the numbers nicely incrementing by one every time the function is called. You can start multiple interactive django sessions and the numbers increment nicely with no identical numbers appearing in the different sessions.
Now I try to use call this function from a view as follows:
import get_order_number
order_number = get_order_number.get_new_order_number()
and it gives me a number. However next time I access the view, it increments the number by 2. I have no idea where the problem is.
The best solution I can come up with is: don't worry if your order numbers are sparse. It should not matter if an order number is missing: there is no way to ensure that order numbers are contiguous that will not be subject to a race condition at some point.
Your biggest problem is likely to be convincing the pointy-haired ones that having 'missing' order numbers is not a problem.
For more details, see the Psuedo-Key Neat Freak entry in SQL Antipatterns. (note, this is a link to a book, which the full text of is not available for free).
Take a look at this question/answer Custom auto-increment field in postgresql (Invoice/Order No.)
You can create stored procedures using RawSql migration.