Django query get common items based on attribute - django

I have a model as follows:
class Item(models.Model):
VENDOR_CHOICES = (
('a', 'A'),
('b', 'B')
)
vendor = models.CharField(max_length=16, choices=VENDOR_CHOICES)
name = models.CharField(max_length=255)
price = models.DecimalField(max_digits=6, decimal_places=2)
Now I have 2 data sources, so I get items from vendor A and items from vendor B.
In some cases vendor A may not have the same items as Vendor B, say vendor A has 30 items and Vendor B has 442 items, out of which only 6 items are common. Items that are common are defined as items that have the exact same name.
I need to also find the difference in prices of items that are common to vendor a and vendor b items, meaning the items that have the same name in vendor a and vendor b. I have a large no. of items which may go upto 10k items per vendor, so a efficient way of doing this would be required?

I think that something like this should work:
vendor_a_items = Item.objects.filter(vendor='a')
vendor_b_items = Item.objects.filter(vendor='b')
common_items = vendor_a_items.filter(
name__in=vendor_b_items.values_list('name', flat=True))
UPDATE: To find the price difference you can just loop over the found common items:
for a_item in common_items:
b_item = vendor_b_items.get(name=a_item.name)
print u'%s: %s' % (a_item.name, a_item.price - b_item.price)
This adds a db hit for each found item but if you have a small number of common items then this solution will work just fine. For larger intersection you can load all prices from vendor_b_items in one query. Use this code instead of the previous snippet.
common_items_d = {item.name: item for item in common_items}
for b_item in vendor_b_items.filter(name__in=common_items_d.keys()):
print u'%s: %s' % (b_item.name,
common_items_d[b_item.name].price - b_item.price)

Starting from django 1.11, this can be resolved using the built in intersection and difference methods.
vendor_a_items = Item.objects.filter(vendor='a')
vendor_b_items = Item.objects.filter(vendor='b')
common_items = vendor_a_items.intersection(vendor_b_items)
vendor_a_exclussive_items = vendor_a_items.difference(vendor_b_items)
vendor_b_exclussive_items = vendor_b_items.difference(vendor_a_items)
See my blog post on this for more detailed use cases.

Related

Are queries using related_name more performant in Django?

Lets say I have the following models set up:
class Shop(models.Model):
...
class Product(models.Model):
shop = models.ForeignKey(Shop, related_name='products')
Now lets say we want to query all the products from the shop with label 'demo' whose prices are below $100. There are two ways to do this:
shop = Shop.objects.get(label='demo')
products = shop.products.filter(price__lte=100)
Or
shop = Shop.objects.get(label='demo')
products = Products.objects.filter(shop=shop, price__lte=100)
Is there a difference between these two queries? The first one is using the related_name property. I know foreign keys are indexed, so searching using them should be faster, but is this applicable in our first situation?
Short answer: this will result in equivalent queries.
We can do the test by printing the queries:
>>> print(shop.products.filter(price__lte=100).query)
SELECT "app_product"."id", "app_product"."shop_id", "app_product"."price" FROM "app_product" WHERE ("app_product"."shop_id" = 1 AND "app_product"."price" <= 100)
>>> print(Product.objects.filter(shop=shop, price__lte=100).query)
SELECT "app_product"."id", "app_product"."shop_id", "app_product"."price" FROM "app_product" WHERE ("app_product"."price" <= 100 AND "app_product"."shop_id" = 1)
except that the conditions in the WHERE are swapped, the two are equal. But usually this does not make any difference at the database side.
If you however are not interested in the Shop object itself, you can filter with:
products = Product.objects.filter(shop__label='demo', price__lte=100)
This will make a JOIN at the database level, and will thus retrieve the data in a single pass:
SELECT "app_product"."id", "app_product"."shop_id", "app_product"."price"
FROM "app_product"
INNER JOIN "app_shop" ON "app_product"."shop_id" = "app_shop"."id"
WHERE "app_product"."price" <= 100 AND "app_shop"."label" = demo

Django - creating and saving multiple object in a loop, with ForeignKeys

I am having trouble creating and saving objects in Django. I am very new to Django so I'm sure I'm missing something very obvious!
I am building a price comparison app, and I have a Search model:
Search - all searches carried out, recording best price, worst price, product searched for, time of search etc. I have successfully managed to save these searches to a DB and am happy with this model.
The two new models I am working with are:
Result - this is intended to record all search results returned, for each search carried out. I.e. Seller 1 £100, Seller 2 £200, Seller 3, £300. (One search has many search results).
'Agent' - a simple table of Agents that I compare prices at. (One Agent can have many search Results).
class Agent(models.Model):
agent_id = models.AutoField(primary_key=True)
agent_name = models.CharField(max_length=30)
class Result(models.Model):
search_id = models.ForeignKey(Search, on_delete=models.CASCADE) # Foreign Key of Search table
agent_id = models.ForeignKey(Agent, on_delete=models.CASCADE) # Foreign Key of Agent table
price = models.FloatField()
search_position = models.IntegerField().
My code that is creating and saving the objects is here:
def update_search_table(listed, product):
if len(listed) > 0:
search = Search(product=product,
no_of_agents=len(listed),
valid_search=1,
best_price=listed[0]['cost'],
worst_price=listed[-1]['cost'])
search.save()
for i in range(len(listed)):
agent = Agent.objects.get(agent_name = listed[i]['company'])
# print(agent.agent_id) # Prints expected value
# print(search.search_id) # Prints expected value
# print(listed[i]['cost']) # Prints expected value
# print(i + 1) # Prints expected value
result = Result(search_id = search,
agent_id = agent,
price = listed[i]['cost'],
position = i + 1)
search.result_set.add(result)
agent.result_set.add(result)
result.save()
Up to search.save() is working as expected.
The first line of the for loop is also correctly retrieving the relevant Agent.
The rest of it is going wrong (i.e. not saving any Result objects to the Result table). What I want to achieve is, if there are 10 different agent results returned, create 10 Result objects and save each one. Link each of those 10 objects to the Search that triggered the results, and link each of those 10 objects to the relevant Agent.
Have tried quite a few iterations but not sure where I'm going wrong.
Thanks

django paginate with non-model object

I'm working on a side project using python and Django. It's a website that tracks the price of some product from some website, then show all the historical price of products.
So, I have this class in Django:
class Product(models.Model):
price = models.FloatField()
date = models.DateTimeField(auto_now = True)
name = models.CharField()
Then, in my views.py, because I want to display products in a table, like so:
+----------+--------+--------+--------+--------+....
| Name | Date 1 | Date 2 | Date 3 |... |....
+----------+--------+--------+--------+--------+....
| Product1 | 100.0 | 120.0 | 70.0 | ... |....
+----------+--------+--------+--------+--------+....
...
I'm using the following class for rendering:
class ProductView(objects):
name = ""
price_history = {}
So that in my template, I can easily convert each product_view object into one table row. I'm also passing through context a sorted list of all available dates, for the purpose of constructing the head of the table, and getting the price of each product on that date.
Then I have logic in views that converts one or more products into this ProductView object. The logic looks something like this:
def conversion():
result_dict = {}
all_products = Product.objects.all()
for product in all_products:
if product.name in result_dict:
result_dict[product.name].append(product)
else:
result_dict[product.name] = [product]
# So result_dict will be like
# {"Product1":[product, product], "Product2":[product],...}
product_views = []
for products in result_dict.values():
# Logic that converts list of Product into ProductView, which is simple.
# Then I'm returning the product_views, sorted based on the price on the
# latest date, None if not available.
return sorted(product_views,
key = lambda x: get_latest_price(latest_date, x),
reverse = True)
As per Daniel Roseman and zymud, adding get_latest_price:
def get_latest_price(date, product_view):
if date in product_view.price_history:
return product_view.price_history[date]
else:
return None
I omitted the logic to get the latest date in conversion. I have a separate table that only records each date I run my price-collecting script that adds new data to the table. So the logic of getting latest date is essentially get the date in OpenDate table with highest ID.
So, the question is, when product grows to a huge amount, how do I paginate that product_views list? e.g. if I want to see 10 products in my web application, how to tell Django to only get those rows out of DB?
I can't (or don't know how to) use django.core.paginator.Paginator, because to create that 10 rows I want, Django needs to select all rows related to that 10 product names. But to figure out which 10 names to select, it first need to get all objects, then figure out which ones have the highest price on the latest date.
It seems to me the only solution would be to add something between Django and DB, like a cache, to store that ProductView objects. but other than that, is there a way to directly paginate produvt_views list?
I'm wondering if this makes sense:
The basic idea is, since I'll need to sort all product_views by the price on the "latest" date, I'll do that bit in DB first, and only get the list of product names to make it "paginatable". Then, I'll do a second DB query, to get all the products that have those product names, then construct that many product_views. Does it make sense?
To clear it a little bit, here comes the code:
So instead of
#def conversion():
all_products = Product.objects.all()
I'm doing this:
#def conversion():
# This would get me the latest available date
latest_date = OpenDate.objects.order_by('-id')[:1]
top_ten_priced_product_names = Product.objects
.filter(date__in = latest_date)
.order_by('-price')
.values_list('name', flat = True)[:10]
all_products_that_i_need = Product.objects
.filter(name__in = top_ten_priced_product_names)
# then I can construct that list of product_views using
# all_products_that_i_need
Then for pages after the first, I can modify that [:10] to say [10:10] or [20:10].
This makes the code pagination easier, and by pulling appropriate code into a separate function, it's also possible to do Ajax and all those fancy stuff.
But, here comes a problem: this solution needs three DB calls for every single query. Right now I'm running everything on the same box, but still I want to reduce this overhead to two(One or Opendate, the other for Product).
Is there a better solution that solves both the pagination problem and with two DB calls?

Django query aggregate upvotes in backward relation

I have two models:
Base_Activity:
some fields
User_Activity:
user = models.ForeignKey(settings.AUTH_USER_MODEL)
activity = models.ForeignKey(Base_Activity)
rating = models.IntegerField(default=0) #Will be -1, 0, or 1
Now I want to query Base_Activity, and sort the items that have the most corresponding user activities with rating=1 on top. I want to do something like the query below, but the =1 part is obviously not working.
activities = Base_Activity.objects.all().annotate(
up_votes = Count('user_activity__rating'=1),
).order_by(
'up_votes'
)
How can I solve this?
You cannot use Count like that, as the error message says:
SyntaxError: keyword can't be an expression
The argument of Count must be a simple string, like user_activity__rating.
I think a good alternative can be to use Avg and Count together:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).order_by(
'-a', '-c'
)
The items with the most rating=1 activities should have the highest average, and among the users with the same average the ones with the most activities will be listed higher.
If you want to exclude items that have downvotes, make sure to add the appropriate filter or exclude operations after annotate, for example:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).filter(user_activity__rating__gt=0).order_by(
'-a', '-c'
)
UPDATE
To get all the items, ordered by their upvotes, disregarding downvotes, I think the only way is to use raw queries, like this:
from django.db import connection
sql = '''
SELECT o.id, SUM(v.rating > 0) s
FROM user_activity o
JOIN rating v ON o.id = v.user_activity_id
GROUP BY o.id ORDER BY s DESC
'''
cursor = connection.cursor()
result = cursor.execute(sql_select)
rows = result.fetchall()
Note: instead of hard-coding the table names of your models, get the table names from the models, for example if your model is called Rating, then you can get its table name with Rating._meta.db_table.
I tested this query on an sqlite3 database, I'm not sure the SUM expression there works in all DBMS. Btw I had a perfect Django site to test, where I also use upvotes and downvotes. I use a very similar model for counting upvotes and downvotes, but I order them by the sum value, stackoverflow style. The site is open-source, if you're interested.

Django complex query without using loop

I have two models such that
class Employer(models.Model):
name = models.CharField(max_length=1000,null=False,blank=False)
eminence = models.IntegerField(null=False,default=4)
class JobTitle(models.Model):
name = models.CharField(max_length=1000,null=False,blank=False)
employer= models.ForeignKey(JobTitle,unique=False,null=False)
class People(models.Model):
name = models.CharField(max_length=1000,null=False,blank=False)
jobtitle = models.ForeignKey(JobTitle,unique=False,null=False)
I would like to list random 5 employers and one job title for each employer. However, job title should be picked up from first 10 jobtitles of the employer whose number of people is maximum.
One approach could be
employers = Employer.objects.filter(isActive=True).filter(eminence__lt=4 ).order_by('?')[:5]
for emp in employers:
jobtitle = JobTitle.objects.filter(employer=emp)... and so on.
However, loop through selected employers may be ineffiecent. Is there any way to do it in one query ?
Thanks
There is! Check out: https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
select_related() tells Django to follow all the foreign key relationships using JOINs. This will result in one large query as opposed to many small queries, which in most cases is what you want. The QuerySet you get will be pre-populated and Django won't have to lazy-load anything from the database.
I've used select_related() in the past to solve almost this exact problem.
I have written such code block and it works. Although I loop over employers because I have used select_related('jobtitle'), I consider it doesn't hit database and works faster.
employers = random.sample(Employer.objects.select_related('jobtitle').filter(eminence__lt=4,status=EmployerStatus.ACTIVE).annotate(jtt_count=Count('jobtitle')).filter(jtt_count__gt=0),3)
jtList = []
for emp in employers:
jt = random.choice(emp.jobtitle_set.filter(isActive=True).annotate(people_count=Count('people')).filter(people_count__gt=0)[:10])
jtList.append(jt)