Django ManyToMany Field or something else? - django

I have a simple eCommerce site with Product and Variation.
Each variation is definitely going to have different weights, and each weight will have a quantity.
I have implemented a Weight model with a ForeignKey relationship to Variation since each variation can have different weights and each weight will have a quantity.
class Weight(models.Model):
variation = models.ForeignKey(Variation)
size = models.DecimalField(decimal_places=3, max_digits=8,
validators=[MinValueValidator(Decimal('0.10'))])
quantity = models.PositiveIntegerField(null=True, blank=True, help_text="Select Quantity for this size")
I added weight as inline and can add multiple weight values and quantity in a variation. Please see this http://imgur.com/XLM6sQJ
One might think this could be possible through creating variation for each weight but since it is definite that each product will have different weights there is no need create a variation for each weight.
Now the problem I am facing is that each variation will have different weights so for e.g. a variation could have weights of 1lb, 2 lb, 3lb. Each of these will create new weight objects for each variation. This means if another variation also has weights of 1lb, 2 lb, 3lb, new objects are created and NOT the existing ones are reused. This will result in a huge db table with many duplicates weight values. This is a problem because there is a limited number of weight and quantity value needed by any product (weight = 1lb to 100lb and quantity = 1 to 100) and so these should ideally be reused.
To avoid this I am thinking to have the Weight model with ManyToMany field to Variation and then quantity should be dropdown for each selected weight. This will allow be to store values of both weight and quantity, and have each product use the same values in each instance.
The problem I have is:
1. Is this the correct approach?
2. if not what is the best approach to do this?
3. If this is the correct approach how to do this?
4. If this is the correct approach how do I display this in admin site since each weight should also have a quantity (I have no clue how to do this)?
5. Is there a better way to achieve this, and if so how?

You have a clear understanding on how you want to do it.
I would agree with you about reusing the same weights for different variations rather than creating new ones which would again have same weights.
This is what I think that would be better there may be multiple ways to do it.
To answer your question, please try this Model relations in your app:
class Quantity(models.Model):
quantity = models.PositiveIntegerField()
class Weight(models.Model):
weight = models.PositiveIntegerField()
quantity = models.ManyToManyField(Quantity)
class Variation(models.Model):
name = models.CharField()
weight = models.ManyToManyField(Weight)
Then add all the weights as you require in the Weights class individually. So after than whenever you need to add some weight to Variation table then you can select the weights from the Weights class in which we have already added the weights that we might require.
In this way you can reuse the same weight for different variations without even having to have duplicate records.
Make sure you have registered both the models in the admin for easy access.
This should solve your problem for having multiple records in the weight table.

Related

Hierarchical (three levels deep) Bayesian pymc3 model

I am working on a Bayesian Hierarchical linear model in Pymc3.
The model consists of three input variables on a daily level: number of users, product category and product sku and the output variable is revenue. In total the data consists of roughly 73.000 records with 180 categories and 12.000 sku's. Moreover, some categories/sku's are highly present while other categories aren't. An example of the data is shown in the link:
Preview of the data
As the data on sku level is very sparse an hierarchical model has been chosen with the intent that sku's with less data should shrink towards the category level mean and if a category is scarce the group level mean should shrink towards the overall mean.
In the final model the categories are label encoded and the continuous variables users and revenue are min-max scaled.
At this point the model is formalized as follows:
with pm.Model() as model:
sigma_overall = pm.HalfNormal("sigma_overall", mu=50)
sigma_category = pm.HalfNormal("sigma_category", mu=sigma_overall)
sigma_sku = pm.HalfNormal("sigma_sku", sigma=sigma_category, shape=n_sku)
beta = pm.HalfNormal("beta", sigma=sigma_sku, shape=n_sku)
epsilon = pm.HalfCauchy("epsilon", 1)
y = pm.Deterministic('y', beta[category_idx][sku_idx] * df['users'].values)
y_likelihood = pm.Normal("y_likelihood", mu=y, sigma=epsilon, observed=df['revenue'].values)
trace = pm.sample(2000)
The main hurdle is that the model is very slow. It takes hours, sometimes a day before the model completes. Metropolis- or NUTS sampling with find_MAP() did not make a difference. Furthermore, I doubt whether the model is formalized correctly as I am pretty new to Pymc3.
A review of the model and advice to speed it up is very welcome.

Join two records from same model in django queryset

Been searching the web for a couple hours now looking for a solution but nothing quite fits what I am looking for.
I have one model (simplified):
class SimpleModel(Model):
name = CharField('Name', unique=True)
date = DateField()
amount = FloatField()
I have two dates; date_one and date_two.
I would like a single queryset with a row for each name in the Model, with each row showing:
{'name': name, 'date_one': date_one, 'date_two': date_two, 'amount_one': amount_one, 'amount_two': amount_two, 'change': amount_two - amount_one}
Reason being I would like to be able to find the rank of amount_one, amount_two, and change, using sort or filters on that single queryset.
I know I could create a list of dictionaries from two separate querysets then sort on that and get the ranks from the index values ...
but perhaps nievely I feel like there should be a DB solution using one queryset that would be faster.
union seemed promising but you cannot perform some simple operations like filter after that
I think I could perhaps split name into its own Model and generate queryset with related fields, but I'd prefer not to change the schema at this stage. Also, I only have access to sqlite.
appreciate any help!
Your current model forces you to have ONE name associated with ONE date and ONE amount. Because name is unique=True, you literally cannot have two dates associated with the same name
So if you want to be able to have several dates/amounts associated with a name, there are several ways to proceed
Idea 1: If there will only be 2 dates and 2 amounts, simply add a second date field and a second amount field
Idea 2: If there can be an infinite number of days and amounts, you'll have to change your model to reflect it, by having :
A model for your names
A model for your days and amounts, with a foreign key to your names
Idea 3: You could keep the same model and simply remove the unique constraint, but that's a recipe for mistakes
Based on your choice, you'll then have several ways of querying what you need. It depends on your final model structure. The best way to go would be to create custom model methods that query the 2 dates/amount, format an array and return it

How to generate Sum (and other aggregates) in Django where aggregate depends on values from related tables

My model consists of a Portfolio, a Holding, and a Company. Each Portfolio has many Holdings, and each Holding is of a single Company (a Company may be connected to many Holdings).
Portfolio -< Holding >- Company
I'd like the Portfolio query to return the sum of the product of the number of Holdings in the Portfolio, and the value of the Company.
Simplified model:
class Portfolio(model):
some fields
class Company(model):
closing = models.DecimalField(max_digits=10, decimal_places=2)
class Holding(model):
portfolio = models.ForeignKey(Portfolio)
company = models.ForeignKey(Company)
num_shares = models.IntegerField(default=0)
I'd like to be able to query:
Portfolio.objects.some_function()
and have each row annotated with the value of the Portfolio, where the value is equal to the sum of the product of the related Company.closing, and Holding.num_shares. ie something like:
annotate(value=Sum('holding__num_shares * company__closing'))
I'd also like to obtain a summary row, which contains the sum of the values of all of a user's Portfolios, and a count of the number of holdings. ie something like:
aggregate(Sum('holding__num_shares * company__closing'), Count('holding__num_shares'))
I would like to do have a similar summary row for a single Portfolio, which would be the sum of the values of each holding, and a count of the total number of holdings in the portfolio.
I managed to get part of the way there using extra:
return self.extra(
select={
'value': 'select sum(h.num_shares * c.closing) from portfolio_holding h '
'inner join portfolio_company as c on h.company_id = c.id '
'where h.portfolio_id = portfolio_portfolio.id'
}).annotate(Count('holding'))
but this is pretty ugly, and extra seems to be frowned upon, for obvious reasons.
My question is: is there a more Djangoistic way to summarise and annotate queries based on multiple fields, and across related tables?
These two options seem to move in the right direction:
Portfolio.objects.annotate(Sum('holding__company__closing'))
(ie this demonstrates annotation/aggregation over a field in a related table)
Holding.objects.annotate(Sum('id', field='num_shares * id'))
(this demonstrates annotation/aggregation over the product of two fields)
but if I attempt to combine them: eg
Portfolio.objects.annotate(Sum('id', field='holding__company__closing * holding__num_shares'))
I get an error: "No such column 'holding__company__closing'.
So far I've looked at the following related questions, but none of them seem to capture this precise problem:
Annotating django QuerySet with values from related table
Product of two fields annotation
Do I just need to bite the bullet and use raw / extra? I'm hoping that Django ORM will prove the exception to the rule that ORMs really only work as designed for simple queries / models, and anything beyond the most basic ones require either seriously gnarly tap-dancing, or stepping out of the abstraction, which somewhat defeats the purpose...
Thanks in advance!

Django: efficient semi-random order_by for user-friendly results?

I have a Django search app with a Postgres back-end that is populated with cars. My scripts load on a brand-by-brand basis: let's say a typical mix is 50 Chryslers, 100 Chevys, and 1500 Fords, loaded in that order.
The default ordering is by creation date:
class Car(models.Model):
name = models.CharField(max_length=500)
brand = models.ForeignKey(Brand, null=True, blank=True)
transmission = models.CharField(max_length=50, choices=TRANSMISSIONS)
created = models.DateField(auto_now_add=True)
class Meta:
ordering = ['-created']
My problem is this: typically when the user does a search, say for a red automatic, and let's say that returns 10% of all cars:
results = Cars.objects.filter(transmission="automatic", color="red")
the user typically gets hundreds of Fords first before any other brand (because the results are ordered by date_added) which is not a good experience.
I'd like to make sure the brands are as evenly distributed as possible among the early results, without big "runs" of one brand. Any clever suggestions for how to achieve this?
The only idea I have is to use the ? operator with order_by:
results = Cars.objects.filter(transmission="automatic", color="red").order_by("?")
This isn't ideal. It's expensive. And it doesn't guarantee a good mix of results, if some brands are much more common than others - so here where Chrysler and Chevy are in the minority, the user is still likely to see lots of Fords first. Ideally I'd show all the Chryslers and Chevys in the first 50 results, nicely mixed in with Fords.
Any ideas on how to achieve a user-friendly ordering? I'm stuck.
What I ended up doing was adding a priority field on the model, and assigning a semi-random integer to it.
By semi-random, I mean random within a particular range: so 1-100 for Chevy and Chrysler, and 1-500 for Ford.
Then I used that field to order_by.

Django Aggregate with several models

I have these models :
class Package(models.Model):
title = CharField(...)
class Item(models.Model)
package = ForeignKey(Package)
price = FloatField(...)
class UserItem(models.Model)
user = ForeignKey(User)
item = ForeignKey(Item)
purchased = BooleanField()
I am trying to achieve 2 functionality with the best performance possible :
In my templete I would like to calculate each package price sum of all its items. (Aggregate I assume ?)
More complicated : I wish that for each user I can sum up the price of all item purchased. so the purchased = True.
Assume I have 10 items in one package which each of them cost 10$ the package sum should be 100$. assume the user purchase 5 items the second sum should be 50$.
I can easily do simple queries with templetetags but I believe it can be done better ? (Hopefully)
To total the price for a specific package a_package you can use this code
Item.objects.filter(package=a_package).aggregate(Sum('price'))
There is a a guide on how to do these kind of queries, and the aggregate documentation with all the different functions described.
This kind of query can also solve your second problem.
UserItem.objects.filter(user=a_user).filter(purchased=True).aggregate(sum('price'))
You can also use annotate() to attach the count to each object, see the first link above.
The most elegant way in my opinion would be to define a method total on the Model class and decorate it as a property. This will return the total (using Django ORM's Sum aggregate) for either Package or User.
Example for class Package:
from django.db.models import Sum
...
class Package(models.Model):
...
#property
def total(self):
return self.item_set.aggregate(Sum('price'))
In your template code you would use total as any other model attribute. E.g.:
{{ package_instance.total }}
#Vic Smith got the solution.
But I would add a price attribute on the package model if you wish
the best performance possible
You would add a on_save signal to Item, and if created, you update the related package object.
This way you can get the package price very quickly, and even make quick sorting, comparing, etc.
Plus, I don't really get the purpose of the purchased attribute. But you probably want to make a ManyToMany relationship between Item and User, and define UserItem as the connection with the trhough parameter.
Anyway, my experience is that you usually want to make a relationship between Item and a Purchasse objet, which is linked to User, and not a direct link (unless you start to get performances issues...). Having Purchasse as a record of the event "the user bough this and that" make things easier to handle.